That is your infrastructure at work :)  As I mentioned previously, we
are however not yet tapping into the full speed possible that you see
in this test 
(http://thread.gmane.org/gmane.science.nmr.relax.devel/6022/focus=6029).
 Especially for data with many spins in a cluster, many magnetic field
strengths, or many offsets.  Let's say that we have the following
counts:

 - NE, the number of different dispersion experiments,
 - NS, the number of spins in one cluster,
 - NM, the number of magnetic field strengths,
 - NO, the number of offsets,
 - ND, the number of dispersion points,

and that these counts are the same for all data combinations.  And
let's say that t_diff is the time difference between Python and numpy
for the calculation of one R2eff value.  Then compared to the 3.2.1
release, the total speed up possible with your infrastructure is
t_diff * ND * NO * NM * NS * NE.  With the 3.2.2 release we have the
t_diff * ND speed up, but not the rest.  If your NO * NM * NS * NE
value is not very high, then you will not see much of a speed up
compared to the ultimate speed up of t_diff * ND * NO * NM * NS * NE.
But if NO * NM * NS * NE is high, then the implementation of this
speed up in the relax target functions might be worth considering (as
described at 
http://thread.gmane.org/gmane.science.nmr.relax.devel/5726/focus=5806).

Regards,

Edward




On 5 June 2014 14:18, Troels Emtekær Linnet <[email protected]> wrote:
> I get these results
>
> That shows a 4x-5x speed-up.
>
> That is quite nice!
>
>
>
> -------
> Checked on MacBook Pro
> 2.4 GHz Intel Core i5
> 8 GB 1067 Mhz DDR3 RAM.
> Python Distribution -- Python 2.7.3 |EPD 7.3-2 (32-bit)|
>
> Timing for:
> 2 fields
> 20 dispersion points
> iterations of function call: 1000
>
> Timed for simulating 1 or 100 clustered spins.
>
> svn ls "^/tags"
>
> ########
> For tag 3.2.2
> svn switch ^/tags/3.2.2
> ########
>
> 1 spin:
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>      2000    0.168    0.000    0.198    0.000 cr72.py:100(r2eff_CR72)
>      1000    0.040    0.000    0.280    0.000
> relax_disp.py:456(calc_CR72_chi2)
>      2000    0.028    0.000    0.039    0.000 chi2.py:32(chi2)
>
> 100 spins:
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>    200000   16.810    0.000   19.912    0.000 cr72.py:100(r2eff_CR72)
>      1000    4.185    0.004   28.518    0.029
> relax_disp.py:456(calc_CR72_chi2)
>    200000    3.018    0.000    4.144    0.000 chi2.py:32(chi2)
>
>
> ########
> For tag 3.2.1
> svn switch ^/tags/3.2.1
> ########
>
> 1 spin:
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>      2000    0.696    0.000    0.697    0.000 cr72.py:98(r2eff_CR72)
>      1000    0.038    0.000    0.781    0.001
> relax_disp.py:456(calc_CR72_chi2)
>      2000    0.031    0.000    0.043    0.000 chi2.py:32(chi2)
>
> 100 spins:
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>    200000   75.880    0.000   76.078    0.000 cr72.py:98(r2eff_CR72)
>      1000    4.201    0.004   85.519    0.086
> relax_disp.py:456(calc_CR72_chi2)
>    200000    3.513    0.000    4.940    0.000 chi2.py:32(chi2)
>
>
>
> 2014-06-05 11:36 GMT+02:00 Edward d'Auvergne <[email protected]>:
>
>> Hi,
>>
>> The best place might be to create a special directory in the
>> test_suite/shared_data/dispersion directories.  Or another option
>> would be to create a devel_scripts/profiling/ directory and place it
>> there.  The first option might be the best though as you could then
>> save additional files there, such as the relax log files with the
>> profile timings.  Or simply have everything on one page in the wiki -
>> script and output.  What do you think is best?
>>
>> Regards,
>>
>> Edward
>>
>>
>>
>> On 5 June 2014 11:27, Troels Emtekær Linnet <[email protected]> wrote:
>> > Hi Ed.
>> >
>> > I have worked on a rather long profiling script now.
>> >
>> > It creates the necessary data structures, and then call the
>> > relax_disp target function.
>> >
>> > Can you devise a "place" to put this script?
>> >
>> > Best
>> > Troels
>> >
>> >
>> >
>> > 2014-06-05 11:13 GMT+02:00 Edward d'Auvergne <[email protected]>:
>> >
>> >> Hi Troels,
>> >>
>> >> This huge speed up you see also applies when you have multiple field
>> >> strength data.  To understand how you can convert the long rank-1
>> >> array you have in your g_* data structures into the multi-index rank-5
>> >> back_calc array with dimensions {Ei, Si, Mi, Oi, Di}, see the numpy
>> >> reshape() function:
>> >>
>> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html
>> >>
>> >> You can obtain this huge speed up if you convert the
>> >> target_functions.relax_disp data structures to be similar to your g_*
>> >> data structures, delete the looping in the func_*() target functions
>> >> over the {Ei, Si, Mi, Oi, Di} dimensions (for the numeric models, this
>> >> looping would need to be shifted into the lib.dispersion code to keep
>> >> the API consistent), pass the new higher-dimensional data into the
>> >> lib.dispersion modules, and finally use R2eff.reshape() to place the
>> >> data back into the back_calc data structure.  This would again need to
>> >> be in a new branch, and you should only do it if you wish to have huge
>> >> speed ups for multi-experiment, clustered, multi-field, or
>> >> multi-offset data.  The speed ups will also only be for the analytic
>> >> models as the numeric models unfortunately do not have the necessary
>> >> maths derived for calculating everything simultaneously in one linear
>> >> algebra operation.
>> >>
>> >> Regards,
>> >>
>> >> Edward
>> >>
>> >>
>> >>
>> >> On 4 June 2014 17:11, Edward d'Auvergne <[email protected]> wrote:
>> >> > Hi,
>> >> >
>> >> > The huge differences are because of the changes in the lib.dispersion
>> >> > modules.  But wait!  The r2eff_CR72() receives the data for each
>> >> > experiment, spin, and offset separately.  So this insane speed up is
>> >> > not realised in the current target functions.  But the potential for
>> >> > these speed ups is there thanks to your infrastructure work in the
>> >> > 'disp_speed' branch.  I have mentioned this before:
>> >> >
>> >> > http://thread.gmane.org/gmane.science.nmr.relax.devel/5726
>> >> >
>> >> > Specifically the follow up at:
>> >> >
>> >> > http://thread.gmane.org/gmane.science.nmr.relax.devel/5726/focus=5806
>> >> >
>> >> > The idea mentioned in this post is exactly the speed up you see in
>> >> > this test!  So if the idea is implemented in relax then, yes, you
>> >> > will
>> >> > see this insane speed up in a clustered analysis.  Especially for
>> >> > large clusters and a large number of offsets (for R1rho but also for
>> >> > CPMG when off-resonace effects are implemented,
>> >> >
>> >> > http://thread.gmane.org/gmane.science.nmr.relax.devel/5414/focus=5445).
>> >> >  But unfortunately currently you do not.
>> >> >
>> >> > Regards,
>> >> >
>> >> > Edward
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On 4 June 2014 16:45, Troels Emtekær Linnet <[email protected]>
>> >> > wrote:
>> >> >> Hi Edward.
>> >> >>
>> >> >> Ah ja.
>> >> >> I overwrite the state file for each new global fitting, with the new
>> >> >> pipe.
>> >> >> So that is increasing quite much.
>> >> >> I will change that.
>> >> >>
>> >> >> I just checked my scripts.
>> >> >> In both cases, I would do one grid search for the first run, and
>> >> >> then
>> >> >> the
>> >> >> recurring analysis would copy the parameters from the first pipe.
>> >> >>
>> >> >> And the speed-up is between these analysis.
>> >> >>
>> >> >> Hm.
>> >> >> I have to take that variable out with the grid search!
>> >> >>
>> >> >> I am trying to device a profile script, which I can put in the base
>> >> >> folder
>> >> >> of older versions of relax.
>> >> >> For example relax 3.1.6 which I also have.
>> >> >>
>> >> >> It looks like this:
>> >> >> -------------
>> >> >> # Python module imports.
>> >> >> from numpy import array, float64, pi, zeros
>> >> >> import sys
>> >> >> import os
>> >> >> import cProfile
>> >> >>
>> >> >> # relax module imports.
>> >> >> from lib.dispersion.cr72 import r2eff_CR72
>> >> >>
>> >> >> # Default parameter values.
>> >> >> r20a = 2.0
>> >> >> r20b = 4.0
>> >> >> pA = 0.95
>> >> >> dw = 2.0
>> >> >> kex = 1000.0
>> >> >>
>> >> >> relax_times = 0.04
>> >> >> ncyc_list = [2, 4, 8, 10, 20, 40, 500]
>> >> >>
>> >> >> # Required data structures.
>> >> >> s_ncyc = array(ncyc_list)
>> >> >> s_num_points = len(s_ncyc)
>> >> >> s_cpmg_frqs = s_ncyc / relax_times
>> >> >> s_R2eff = zeros(s_num_points, float64)
>> >> >>
>> >> >> g_ncyc = array(ncyc_list*100)
>> >> >> g_num_points = len(g_ncyc)
>> >> >> g_cpmg_frqs = g_ncyc / relax_times
>> >> >> g_R2eff = zeros(g_num_points, float64)
>> >> >>
>> >> >> # The spin Larmor frequencies.
>> >> >> sfrq = 200. * 1E6
>> >> >>
>> >> >> # Calculate pB.
>> >> >> pB = 1.0 - pA
>> >> >>
>> >> >> # Exchange rates.
>> >> >> k_BA = pA * kex
>> >> >> k_AB = pB * kex
>> >> >>
>> >> >> # Calculate spin Larmor frequencies in 2pi.
>> >> >> frqs = sfrq * 2 * pi
>> >> >>
>> >> >> # Convert dw from ppm to rad/s.
>> >> >> dw_frq = dw * frqs / 1.e6
>> >> >>
>> >> >>
>> >> >> def single():
>> >> >>     for i in xrange(0,10000):
>> >> >>         r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, kex=kex,
>> >> >> cpmg_frqs=s_cpmg_frqs, back_calc=s_R2eff, num_points=s_num_points)
>> >> >>
>> >> >> cProfile.run('single()')
>> >> >>
>> >> >> def cluster():
>> >> >>     for i in xrange(0,10000):
>> >> >>         r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, kex=kex,
>> >> >> cpmg_frqs=g_cpmg_frqs, back_calc=g_R2eff, num_points=g_num_points)
>> >> >>
>> >> >> cProfile.run('cluster()')
>> >> >> ------------------------
>> >> >>
>> >> >> For 3.1.6
>> >> >> [tlinnet@tomat relax-3.1.6]$ python profile_lib_dispersion_cr72.py
>> >> >>          20003 function calls in 0.793 CPU seconds
>> >> >>
>> >> >>    Ordered by: standard name
>> >> >>
>> >> >>    ncalls  tottime  percall  cumtime  percall
>> >> >> filename:lineno(function)
>> >> >>         1    0.000    0.000    0.793    0.793 <string>:1(<module>)
>> >> >>     10000    0.778    0.000    0.783    0.000 cr72.py:98(r2eff_CR72)
>> >> >>         1    0.010    0.010    0.793    0.793
>> >> >> profile_lib_dispersion_cr72.py:69(single)
>> >> >>         1    0.000    0.000    0.000    0.000 {method 'disable' of
>> >> >> '_lsprof.Profiler' objects}
>> >> >>     10000    0.005    0.000    0.005    0.000 {range}
>> >> >>
>> >> >>
>> >> >>          20003 function calls in 61.901 CPU seconds
>> >> >>
>> >> >>    Ordered by: standard name
>> >> >>
>> >> >>    ncalls  tottime  percall  cumtime  percall
>> >> >> filename:lineno(function)
>> >> >>         1    0.000    0.000   61.901   61.901 <string>:1(<module>)
>> >> >>     10000   61.853    0.006   61.887    0.006 cr72.py:98(r2eff_CR72)
>> >> >>         1    0.013    0.013   61.901   61.901
>> >> >> profile_lib_dispersion_cr72.py:75(cluster)
>> >> >>         1    0.000    0.000    0.000    0.000 {method 'disable' of
>> >> >> '_lsprof.Profiler' objects}
>> >> >>     10000    0.035    0.000    0.035    0.000 {range}
>> >> >>
>> >> >>
>> >> >> For trunk
>> >> >>
>> >> >> [tlinnet@tomat relax_trunk]$ python profile_lib_dispersion_cr72.py
>> >> >>          80003 function calls in 0.514 CPU seconds
>> >> >>
>> >> >>    Ordered by: standard name
>> >> >>
>> >> >>    ncalls  tottime  percall  cumtime  percall
>> >> >> filename:lineno(function)
>> >> >>         1    0.000    0.000    0.514    0.514 <string>:1(<module>)
>> >> >>     10000    0.390    0.000    0.503    0.000
>> >> >> cr72.py:100(r2eff_CR72)
>> >> >>     10000    0.008    0.000    0.040    0.000
>> >> >> fromnumeric.py:1314(sum)
>> >> >>     10000    0.007    0.000    0.037    0.000
>> >> >> fromnumeric.py:1708(amax)
>> >> >>     10000    0.006    0.000    0.037    0.000
>> >> >> fromnumeric.py:1769(amin)
>> >> >>         1    0.011    0.011    0.514    0.514
>> >> >> profile_lib_dispersion_cr72.py:69(single)
>> >> >>     10000    0.007    0.000    0.007    0.000 {isinstance}
>> >> >>         1    0.000    0.000    0.000    0.000 {method 'disable' of
>> >> >> '_lsprof.Profiler' objects}
>> >> >>     10000    0.030    0.000    0.030    0.000 {method 'max' of
>> >> >> 'numpy.ndarray' objects}
>> >> >>     10000    0.030    0.000    0.030    0.000 {method 'min' of
>> >> >> 'numpy.ndarray' objects}
>> >> >>     10000    0.025    0.000    0.025    0.000 {method 'sum' of
>> >> >> 'numpy.ndarray' objects}
>> >> >>
>> >> >>
>> >> >>          80003 function calls in 1.209 CPU seconds
>> >> >>
>> >> >>    Ordered by: standard name
>> >> >>
>> >> >>    ncalls  tottime  percall  cumtime  percall
>> >> >> filename:lineno(function)
>> >> >>         1    0.000    0.000    1.209    1.209 <string>:1(<module>)
>> >> >>     10000    1.042    0.000    1.196    0.000
>> >> >> cr72.py:100(r2eff_CR72)
>> >> >>     10000    0.009    0.000    0.049    0.000
>> >> >> fromnumeric.py:1314(sum)
>> >> >>     10000    0.007    0.000    0.052    0.000
>> >> >> fromnumeric.py:1708(amax)
>> >> >>     10000    0.007    0.000    0.052    0.000
>> >> >> fromnumeric.py:1769(amin)
>> >> >>         1    0.014    0.014    1.209    1.209
>> >> >> profile_lib_dispersion_cr72.py:75(cluster)
>> >> >>     10000    0.007    0.000    0.007    0.000 {isinstance}
>> >> >>         1    0.000    0.000    0.000    0.000 {method 'disable' of
>> >> >> '_lsprof.Profiler' objects}
>> >> >>     10000    0.045    0.000    0.045    0.000 {method 'max' of
>> >> >> 'numpy.ndarray' objects}
>> >> >>     10000    0.045    0.000    0.045    0.000 {method 'min' of
>> >> >> 'numpy.ndarray' objects}
>> >> >>     10000    0.033    0.000    0.033    0.000 {method 'sum' of
>> >> >> 'numpy.ndarray' objects}
>> >> >> ---------------
>> >> >>
>> >> >> For 10000 iterations
>> >> >>
>> >> >> 3.1.6
>> >> >> Single: 0.778
>> >> >> 100 cluster: 61.853
>> >> >>
>> >> >> trunk
>> >> >> Single: 0.390
>> >> >> 100 cluster: 1.042
>> >> >>
>> >> >> ------
>> >> >>
>> >> >> For 1000000 iterations
>> >> >> 3.1.6
>> >> >> Single: 83.365
>> >> >> 100 cluster:  ???? Still running....
>> >> >>
>> >> >> trunk
>> >> >> Single: 40.825
>> >> >> 100 cluster: 106.339
>> >> >>
>> >> >> I am doing something wrong here?
>> >> >>
>> >> >> That is such a massive speed up for clustered analysis, that I
>> >> >> simply
>> >> >> can't
>> >> >> believe it!
>> >> >>
>> >> >> Best
>> >> >> Troels
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2014-06-04 15:04 GMT+02:00 Edward d'Auvergne <[email protected]>:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> Such a huge speed up cannot be from the changes of the 'disp_speed'
>> >> >>> branch alone.  I would expect from that branch a maximum drop from
>> >> >>> 30
>> >> >>> min to 15 min.  Therefore it must be your grid search changes.
>> >> >>> When
>> >> >>> changing, simplifying, or eliminating the grid search, you have to
>> >> >>> be
>> >> >>> very careful about the introduced bias.  This bias is unavoidable.
>> >> >>> It
>> >> >>> needs to be mentioned in the methods of any paper.  The key is to
>> >> >>> be
>> >> >>> happy that the bias you have introduced will not negatively impact
>> >> >>> your results.  For example if you believe that the grid search
>> >> >>> replacement is reasonably close to the true solution that the
>> >> >>> optimisation will be able to reach the global minimum.  You also
>> >> >>> have
>> >> >>> to convince the people reading your paper that the introduced bias
>> >> >>> is
>> >> >>> reasonable.
>> >> >>>
>> >> >>> As for a script to show the speed changes, you could have a look at
>> >> >>> maybe the
>> >> >>>
>> >> >>> test_suite/shared_data/dispersion/Hansen/relax_results/relax_disp.py
>> >> >>> file.  This performs a full analysis with a large range of
>> >> >>> dispersion
>> >> >>> models on the truncated data set from Flemming Hansen.  Or
>> >> >>> test_suite/shared_data/dispersion/Hansen/relax_disp.py which uses
>> >> >>> all
>> >> >>> of Flemming's data.  These could be run before and after the merger
>> >> >>> of
>> >> >>> the 'disp_speed' branch, maybe with different models and the
>> >> >>> profile
>> >> >>> flag turned on.  You could then create a text file in the
>> >> >>> test_suite/shared_data/dispersion/Hansen/relax_results/ directory
>> >> >>> called something like 'relax_timings' to permanently record the
>> >> >>> speed
>> >> >>> ups.  This file can be used in the future for documenting any other
>> >> >>> speed ups as well.
>> >> >>>
>> >> >>> Regards,
>> >> >>>
>> >> >>> Edward
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 4 June 2014 14:37, Troels Emtekær Linnet <[email protected]>
>> >> >>> wrote:
>> >> >>> > Looking at my old data, I can see that writing out of data
>> >> >>> > between
>> >> >>> > each
>> >> >>> > global fit analysis before took around 30 min.
>> >> >>> >
>> >> >>> > They now take 2-6 mins.
>> >> >>> >
>> >> >>> > I almost can't believe that speed up!
>> >> >>> >
>> >> >>> > Could we devise a devel-script, which we could use to simulate
>> >> >>> > the
>> >> >>> > change?
>> >> >>> >
>> >> >>> > Best
>> >> >>> > Troels
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > 2014-06-04 14:24 GMT+02:00 Troels Emtekær Linnet
>> >> >>> > <[email protected]>:
>> >> >>> >
>> >> >>> >> Hi Edward.
>> >> >>> >>
>> >> >>> >> After the changes to the lib/dispersion/model.py files, I see
>> >> >>> >> massive
>> >> >>> >> speed-up of the computations.
>> >> >>> >>
>> >> >>> >> During 2 days, I performed over 600 global fittings for a 68
>> >> >>> >> residue
>> >> >>> >> protein, where all residues where clustered.I just did it with 1
>> >> >>> >> cpu.
>> >> >>> >>
>> >> >>> >> This is really really impressive.
>> >> >>> >>
>> >> >>> >> I did though also alter how the grid search was performed,
>> >> >>> >> pre-setting
>> >> >>> >> some of the values from known values referred to in a paper.
>> >> >>> >> So I can't really say what has cut the time down.
>> >> >>> >>
>> >> >>> >> But looking at the calculations running, the minimisation runs
>> >> >>> >> quite
>> >> >>> >> fast.
>> >> >>> >>
>> >> >>> >> So, how does relax do the collecting of data for global fitting?
>> >> >>> >>
>> >> >>> >> Does i collect all the R2eff values for the clustered spins, and
>> >> >>> >> sent
>> >> >>> >> it
>> >> >>> >> to the target function
>> >> >>> >> together with the array of parameters to vary?
>> >> >>> >>
>> >> >>> >> Or does it calculate per spin, and share the common parameters?
>> >> >>> >>
>> >> >>> >> My current bottle neck actually seems to be the saving of the
>> >> >>> >> state
>> >> >>> >> file,
>> >> >>> >> between each iteration of global analysis.
>> >> >>> >>
>> >> >>> >> Best
>> >> >>> >> Troels
>> >> >>> >>
>> >> >>> > _______________________________________________
>> >> >>> > relax (http://www.nmr-relax.com)
>> >> >>> >
>> >> >>> > This is the relax-devel mailing list
>> >> >>> > [email protected]
>> >> >>> >
>> >> >>> > To unsubscribe from this list, get a password
>> >> >>> > reminder, or change your subscription options,
>> >> >>> > visit the list information page at
>> >> >>> > https://mail.gna.org/listinfo/relax-devel
>> >> >>
>> >> >>
>> >
>> >
>
>

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Reply via email to