Hi Edward. So, I have tried to implement directly the infrastructure data format for NO * NM * NS * NE. And the speed up is 4.1x-4.5x times faster.
I think that is a very nice message to the release list. It is obvious, that the largest speed-up will be gained by getting rid of the NS loop. Could one just re-shape the numpy arrays in the target function? Best Troels 2014-06-05 14:36 GMT+02:00 Edward d'Auvergne <[email protected]>: > That is your infrastructure at work :) As I mentioned previously, we > are however not yet tapping into the full speed possible that you see > in this test ( > http://thread.gmane.org/gmane.science.nmr.relax.devel/6022/focus=6029). > Especially for data with many spins in a cluster, many magnetic field > strengths, or many offsets. Let's say that we have the following > counts: > > - NE, the number of different dispersion experiments, > - NS, the number of spins in one cluster, > - NM, the number of magnetic field strengths, > - NO, the number of offsets, > - ND, the number of dispersion points, > > and that these counts are the same for all data combinations. And > let's say that t_diff is the time difference between Python and numpy > for the calculation of one R2eff value. Then compared to the 3.2.1 > release, the total speed up possible with your infrastructure is > t_diff * ND * NO * NM * NS * NE. With the 3.2.2 release we have the > t_diff * ND speed up, but not the rest. If your NO * NM * NS * NE > value is not very high, then you will not see much of a speed up > compared to the ultimate speed up of t_diff * ND * NO * NM * NS * NE. > But if NO * NM * NS * NE is high, then the implementation of this > speed up in the relax target functions might be worth considering (as > described at > http://thread.gmane.org/gmane.science.nmr.relax.devel/5726/focus=5806). > > Regards, > > Edward > > > > > On 5 June 2014 14:18, Troels Emtekær Linnet <[email protected]> wrote: > > I get these results > > > > That shows a 4x-5x speed-up. > > > > That is quite nice! > > > > > > > > ------- > > Checked on MacBook Pro > > 2.4 GHz Intel Core i5 > > 8 GB 1067 Mhz DDR3 RAM. > > Python Distribution -- Python 2.7.3 |EPD 7.3-2 (32-bit)| > > > > Timing for: > > 2 fields > > 20 dispersion points > > iterations of function call: 1000 > > > > Timed for simulating 1 or 100 clustered spins. > > > > svn ls "^/tags" > > > > ######## > > For tag 3.2.2 > > svn switch ^/tags/3.2.2 > > ######## > > > > 1 spin: > > ncalls tottime percall cumtime percall filename:lineno(function) > > 2000 0.168 0.000 0.198 0.000 cr72.py:100(r2eff_CR72) > > 1000 0.040 0.000 0.280 0.000 > > relax_disp.py:456(calc_CR72_chi2) > > 2000 0.028 0.000 0.039 0.000 chi2.py:32(chi2) > > > > 100 spins: > > ncalls tottime percall cumtime percall filename:lineno(function) > > 200000 16.810 0.000 19.912 0.000 cr72.py:100(r2eff_CR72) > > 1000 4.185 0.004 28.518 0.029 > > relax_disp.py:456(calc_CR72_chi2) > > 200000 3.018 0.000 4.144 0.000 chi2.py:32(chi2) > > > > > > ######## > > For tag 3.2.1 > > svn switch ^/tags/3.2.1 > > ######## > > > > 1 spin: > > ncalls tottime percall cumtime percall filename:lineno(function) > > 2000 0.696 0.000 0.697 0.000 cr72.py:98(r2eff_CR72) > > 1000 0.038 0.000 0.781 0.001 > > relax_disp.py:456(calc_CR72_chi2) > > 2000 0.031 0.000 0.043 0.000 chi2.py:32(chi2) > > > > 100 spins: > > ncalls tottime percall cumtime percall filename:lineno(function) > > 200000 75.880 0.000 76.078 0.000 cr72.py:98(r2eff_CR72) > > 1000 4.201 0.004 85.519 0.086 > > relax_disp.py:456(calc_CR72_chi2) > > 200000 3.513 0.000 4.940 0.000 chi2.py:32(chi2) > > > > > > > > 2014-06-05 11:36 GMT+02:00 Edward d'Auvergne <[email protected]>: > > > >> Hi, > >> > >> The best place might be to create a special directory in the > >> test_suite/shared_data/dispersion directories. Or another option > >> would be to create a devel_scripts/profiling/ directory and place it > >> there. The first option might be the best though as you could then > >> save additional files there, such as the relax log files with the > >> profile timings. Or simply have everything on one page in the wiki - > >> script and output. What do you think is best? > >> > >> Regards, > >> > >> Edward > >> > >> > >> > >> On 5 June 2014 11:27, Troels Emtekær Linnet <[email protected]> > wrote: > >> > Hi Ed. > >> > > >> > I have worked on a rather long profiling script now. > >> > > >> > It creates the necessary data structures, and then call the > >> > relax_disp target function. > >> > > >> > Can you devise a "place" to put this script? > >> > > >> > Best > >> > Troels > >> > > >> > > >> > > >> > 2014-06-05 11:13 GMT+02:00 Edward d'Auvergne <[email protected]>: > >> > > >> >> Hi Troels, > >> >> > >> >> This huge speed up you see also applies when you have multiple field > >> >> strength data. To understand how you can convert the long rank-1 > >> >> array you have in your g_* data structures into the multi-index > rank-5 > >> >> back_calc array with dimensions {Ei, Si, Mi, Oi, Di}, see the numpy > >> >> reshape() function: > >> >> > >> >> > http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html > >> >> > >> >> You can obtain this huge speed up if you convert the > >> >> target_functions.relax_disp data structures to be similar to your g_* > >> >> data structures, delete the looping in the func_*() target functions > >> >> over the {Ei, Si, Mi, Oi, Di} dimensions (for the numeric models, > this > >> >> looping would need to be shifted into the lib.dispersion code to keep > >> >> the API consistent), pass the new higher-dimensional data into the > >> >> lib.dispersion modules, and finally use R2eff.reshape() to place the > >> >> data back into the back_calc data structure. This would again need > to > >> >> be in a new branch, and you should only do it if you wish to have > huge > >> >> speed ups for multi-experiment, clustered, multi-field, or > >> >> multi-offset data. The speed ups will also only be for the analytic > >> >> models as the numeric models unfortunately do not have the necessary > >> >> maths derived for calculating everything simultaneously in one linear > >> >> algebra operation. > >> >> > >> >> Regards, > >> >> > >> >> Edward > >> >> > >> >> > >> >> > >> >> On 4 June 2014 17:11, Edward d'Auvergne <[email protected]> > wrote: > >> >> > Hi, > >> >> > > >> >> > The huge differences are because of the changes in the > lib.dispersion > >> >> > modules. But wait! The r2eff_CR72() receives the data for each > >> >> > experiment, spin, and offset separately. So this insane speed up > is > >> >> > not realised in the current target functions. But the potential > for > >> >> > these speed ups is there thanks to your infrastructure work in the > >> >> > 'disp_speed' branch. I have mentioned this before: > >> >> > > >> >> > http://thread.gmane.org/gmane.science.nmr.relax.devel/5726 > >> >> > > >> >> > Specifically the follow up at: > >> >> > > >> >> > > http://thread.gmane.org/gmane.science.nmr.relax.devel/5726/focus=5806 > >> >> > > >> >> > The idea mentioned in this post is exactly the speed up you see in > >> >> > this test! So if the idea is implemented in relax then, yes, you > >> >> > will > >> >> > see this insane speed up in a clustered analysis. Especially for > >> >> > large clusters and a large number of offsets (for R1rho but also > for > >> >> > CPMG when off-resonace effects are implemented, > >> >> > > >> >> > > http://thread.gmane.org/gmane.science.nmr.relax.devel/5414/focus=5445). > >> >> > But unfortunately currently you do not. > >> >> > > >> >> > Regards, > >> >> > > >> >> > Edward > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On 4 June 2014 16:45, Troels Emtekær Linnet <[email protected] > > > >> >> > wrote: > >> >> >> Hi Edward. > >> >> >> > >> >> >> Ah ja. > >> >> >> I overwrite the state file for each new global fitting, with the > new > >> >> >> pipe. > >> >> >> So that is increasing quite much. > >> >> >> I will change that. > >> >> >> > >> >> >> I just checked my scripts. > >> >> >> In both cases, I would do one grid search for the first run, and > >> >> >> then > >> >> >> the > >> >> >> recurring analysis would copy the parameters from the first pipe. > >> >> >> > >> >> >> And the speed-up is between these analysis. > >> >> >> > >> >> >> Hm. > >> >> >> I have to take that variable out with the grid search! > >> >> >> > >> >> >> I am trying to device a profile script, which I can put in the > base > >> >> >> folder > >> >> >> of older versions of relax. > >> >> >> For example relax 3.1.6 which I also have. > >> >> >> > >> >> >> It looks like this: > >> >> >> ------------- > >> >> >> # Python module imports. > >> >> >> from numpy import array, float64, pi, zeros > >> >> >> import sys > >> >> >> import os > >> >> >> import cProfile > >> >> >> > >> >> >> # relax module imports. > >> >> >> from lib.dispersion.cr72 import r2eff_CR72 > >> >> >> > >> >> >> # Default parameter values. > >> >> >> r20a = 2.0 > >> >> >> r20b = 4.0 > >> >> >> pA = 0.95 > >> >> >> dw = 2.0 > >> >> >> kex = 1000.0 > >> >> >> > >> >> >> relax_times = 0.04 > >> >> >> ncyc_list = [2, 4, 8, 10, 20, 40, 500] > >> >> >> > >> >> >> # Required data structures. > >> >> >> s_ncyc = array(ncyc_list) > >> >> >> s_num_points = len(s_ncyc) > >> >> >> s_cpmg_frqs = s_ncyc / relax_times > >> >> >> s_R2eff = zeros(s_num_points, float64) > >> >> >> > >> >> >> g_ncyc = array(ncyc_list*100) > >> >> >> g_num_points = len(g_ncyc) > >> >> >> g_cpmg_frqs = g_ncyc / relax_times > >> >> >> g_R2eff = zeros(g_num_points, float64) > >> >> >> > >> >> >> # The spin Larmor frequencies. > >> >> >> sfrq = 200. * 1E6 > >> >> >> > >> >> >> # Calculate pB. > >> >> >> pB = 1.0 - pA > >> >> >> > >> >> >> # Exchange rates. > >> >> >> k_BA = pA * kex > >> >> >> k_AB = pB * kex > >> >> >> > >> >> >> # Calculate spin Larmor frequencies in 2pi. > >> >> >> frqs = sfrq * 2 * pi > >> >> >> > >> >> >> # Convert dw from ppm to rad/s. > >> >> >> dw_frq = dw * frqs / 1.e6 > >> >> >> > >> >> >> > >> >> >> def single(): > >> >> >> for i in xrange(0,10000): > >> >> >> r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, > kex=kex, > >> >> >> cpmg_frqs=s_cpmg_frqs, back_calc=s_R2eff, num_points=s_num_points) > >> >> >> > >> >> >> cProfile.run('single()') > >> >> >> > >> >> >> def cluster(): > >> >> >> for i in xrange(0,10000): > >> >> >> r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, > kex=kex, > >> >> >> cpmg_frqs=g_cpmg_frqs, back_calc=g_R2eff, num_points=g_num_points) > >> >> >> > >> >> >> cProfile.run('cluster()') > >> >> >> ------------------------ > >> >> >> > >> >> >> For 3.1.6 > >> >> >> [tlinnet@tomat relax-3.1.6]$ python > profile_lib_dispersion_cr72.py > >> >> >> 20003 function calls in 0.793 CPU seconds > >> >> >> > >> >> >> Ordered by: standard name > >> >> >> > >> >> >> ncalls tottime percall cumtime percall > >> >> >> filename:lineno(function) > >> >> >> 1 0.000 0.000 0.793 0.793 <string>:1(<module>) > >> >> >> 10000 0.778 0.000 0.783 0.000 > cr72.py:98(r2eff_CR72) > >> >> >> 1 0.010 0.010 0.793 0.793 > >> >> >> profile_lib_dispersion_cr72.py:69(single) > >> >> >> 1 0.000 0.000 0.000 0.000 {method 'disable' of > >> >> >> '_lsprof.Profiler' objects} > >> >> >> 10000 0.005 0.000 0.005 0.000 {range} > >> >> >> > >> >> >> > >> >> >> 20003 function calls in 61.901 CPU seconds > >> >> >> > >> >> >> Ordered by: standard name > >> >> >> > >> >> >> ncalls tottime percall cumtime percall > >> >> >> filename:lineno(function) > >> >> >> 1 0.000 0.000 61.901 61.901 <string>:1(<module>) > >> >> >> 10000 61.853 0.006 61.887 0.006 > cr72.py:98(r2eff_CR72) > >> >> >> 1 0.013 0.013 61.901 61.901 > >> >> >> profile_lib_dispersion_cr72.py:75(cluster) > >> >> >> 1 0.000 0.000 0.000 0.000 {method 'disable' of > >> >> >> '_lsprof.Profiler' objects} > >> >> >> 10000 0.035 0.000 0.035 0.000 {range} > >> >> >> > >> >> >> > >> >> >> For trunk > >> >> >> > >> >> >> [tlinnet@tomat relax_trunk]$ python > profile_lib_dispersion_cr72.py > >> >> >> 80003 function calls in 0.514 CPU seconds > >> >> >> > >> >> >> Ordered by: standard name > >> >> >> > >> >> >> ncalls tottime percall cumtime percall > >> >> >> filename:lineno(function) > >> >> >> 1 0.000 0.000 0.514 0.514 <string>:1(<module>) > >> >> >> 10000 0.390 0.000 0.503 0.000 > >> >> >> cr72.py:100(r2eff_CR72) > >> >> >> 10000 0.008 0.000 0.040 0.000 > >> >> >> fromnumeric.py:1314(sum) > >> >> >> 10000 0.007 0.000 0.037 0.000 > >> >> >> fromnumeric.py:1708(amax) > >> >> >> 10000 0.006 0.000 0.037 0.000 > >> >> >> fromnumeric.py:1769(amin) > >> >> >> 1 0.011 0.011 0.514 0.514 > >> >> >> profile_lib_dispersion_cr72.py:69(single) > >> >> >> 10000 0.007 0.000 0.007 0.000 {isinstance} > >> >> >> 1 0.000 0.000 0.000 0.000 {method 'disable' of > >> >> >> '_lsprof.Profiler' objects} > >> >> >> 10000 0.030 0.000 0.030 0.000 {method 'max' of > >> >> >> 'numpy.ndarray' objects} > >> >> >> 10000 0.030 0.000 0.030 0.000 {method 'min' of > >> >> >> 'numpy.ndarray' objects} > >> >> >> 10000 0.025 0.000 0.025 0.000 {method 'sum' of > >> >> >> 'numpy.ndarray' objects} > >> >> >> > >> >> >> > >> >> >> 80003 function calls in 1.209 CPU seconds > >> >> >> > >> >> >> Ordered by: standard name > >> >> >> > >> >> >> ncalls tottime percall cumtime percall > >> >> >> filename:lineno(function) > >> >> >> 1 0.000 0.000 1.209 1.209 <string>:1(<module>) > >> >> >> 10000 1.042 0.000 1.196 0.000 > >> >> >> cr72.py:100(r2eff_CR72) > >> >> >> 10000 0.009 0.000 0.049 0.000 > >> >> >> fromnumeric.py:1314(sum) > >> >> >> 10000 0.007 0.000 0.052 0.000 > >> >> >> fromnumeric.py:1708(amax) > >> >> >> 10000 0.007 0.000 0.052 0.000 > >> >> >> fromnumeric.py:1769(amin) > >> >> >> 1 0.014 0.014 1.209 1.209 > >> >> >> profile_lib_dispersion_cr72.py:75(cluster) > >> >> >> 10000 0.007 0.000 0.007 0.000 {isinstance} > >> >> >> 1 0.000 0.000 0.000 0.000 {method 'disable' of > >> >> >> '_lsprof.Profiler' objects} > >> >> >> 10000 0.045 0.000 0.045 0.000 {method 'max' of > >> >> >> 'numpy.ndarray' objects} > >> >> >> 10000 0.045 0.000 0.045 0.000 {method 'min' of > >> >> >> 'numpy.ndarray' objects} > >> >> >> 10000 0.033 0.000 0.033 0.000 {method 'sum' of > >> >> >> 'numpy.ndarray' objects} > >> >> >> --------------- > >> >> >> > >> >> >> For 10000 iterations > >> >> >> > >> >> >> 3.1.6 > >> >> >> Single: 0.778 > >> >> >> 100 cluster: 61.853 > >> >> >> > >> >> >> trunk > >> >> >> Single: 0.390 > >> >> >> 100 cluster: 1.042 > >> >> >> > >> >> >> ------ > >> >> >> > >> >> >> For 1000000 iterations > >> >> >> 3.1.6 > >> >> >> Single: 83.365 > >> >> >> 100 cluster: ???? Still running.... > >> >> >> > >> >> >> trunk > >> >> >> Single: 40.825 > >> >> >> 100 cluster: 106.339 > >> >> >> > >> >> >> I am doing something wrong here? > >> >> >> > >> >> >> That is such a massive speed up for clustered analysis, that I > >> >> >> simply > >> >> >> can't > >> >> >> believe it! > >> >> >> > >> >> >> Best > >> >> >> Troels > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> 2014-06-04 15:04 GMT+02:00 Edward d'Auvergne < > [email protected]>: > >> >> >> > >> >> >>> Hi, > >> >> >>> > >> >> >>> Such a huge speed up cannot be from the changes of the > 'disp_speed' > >> >> >>> branch alone. I would expect from that branch a maximum drop > from > >> >> >>> 30 > >> >> >>> min to 15 min. Therefore it must be your grid search changes. > >> >> >>> When > >> >> >>> changing, simplifying, or eliminating the grid search, you have > to > >> >> >>> be > >> >> >>> very careful about the introduced bias. This bias is > unavoidable. > >> >> >>> It > >> >> >>> needs to be mentioned in the methods of any paper. The key is to > >> >> >>> be > >> >> >>> happy that the bias you have introduced will not negatively > impact > >> >> >>> your results. For example if you believe that the grid search > >> >> >>> replacement is reasonably close to the true solution that the > >> >> >>> optimisation will be able to reach the global minimum. You also > >> >> >>> have > >> >> >>> to convince the people reading your paper that the introduced > bias > >> >> >>> is > >> >> >>> reasonable. > >> >> >>> > >> >> >>> As for a script to show the speed changes, you could have a look > at > >> >> >>> maybe the > >> >> >>> > >> >> >>> > test_suite/shared_data/dispersion/Hansen/relax_results/relax_disp.py > >> >> >>> file. This performs a full analysis with a large range of > >> >> >>> dispersion > >> >> >>> models on the truncated data set from Flemming Hansen. Or > >> >> >>> test_suite/shared_data/dispersion/Hansen/relax_disp.py which uses > >> >> >>> all > >> >> >>> of Flemming's data. These could be run before and after the > merger > >> >> >>> of > >> >> >>> the 'disp_speed' branch, maybe with different models and the > >> >> >>> profile > >> >> >>> flag turned on. You could then create a text file in the > >> >> >>> test_suite/shared_data/dispersion/Hansen/relax_results/ directory > >> >> >>> called something like 'relax_timings' to permanently record the > >> >> >>> speed > >> >> >>> ups. This file can be used in the future for documenting any > other > >> >> >>> speed ups as well. > >> >> >>> > >> >> >>> Regards, > >> >> >>> > >> >> >>> Edward > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> On 4 June 2014 14:37, Troels Emtekær Linnet < > [email protected]> > >> >> >>> wrote: > >> >> >>> > Looking at my old data, I can see that writing out of data > >> >> >>> > between > >> >> >>> > each > >> >> >>> > global fit analysis before took around 30 min. > >> >> >>> > > >> >> >>> > They now take 2-6 mins. > >> >> >>> > > >> >> >>> > I almost can't believe that speed up! > >> >> >>> > > >> >> >>> > Could we devise a devel-script, which we could use to simulate > >> >> >>> > the > >> >> >>> > change? > >> >> >>> > > >> >> >>> > Best > >> >> >>> > Troels > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > 2014-06-04 14:24 GMT+02:00 Troels Emtekær Linnet > >> >> >>> > <[email protected]>: > >> >> >>> > > >> >> >>> >> Hi Edward. > >> >> >>> >> > >> >> >>> >> After the changes to the lib/dispersion/model.py files, I see > >> >> >>> >> massive > >> >> >>> >> speed-up of the computations. > >> >> >>> >> > >> >> >>> >> During 2 days, I performed over 600 global fittings for a 68 > >> >> >>> >> residue > >> >> >>> >> protein, where all residues where clustered.I just did it > with 1 > >> >> >>> >> cpu. > >> >> >>> >> > >> >> >>> >> This is really really impressive. > >> >> >>> >> > >> >> >>> >> I did though also alter how the grid search was performed, > >> >> >>> >> pre-setting > >> >> >>> >> some of the values from known values referred to in a paper. > >> >> >>> >> So I can't really say what has cut the time down. > >> >> >>> >> > >> >> >>> >> But looking at the calculations running, the minimisation runs > >> >> >>> >> quite > >> >> >>> >> fast. > >> >> >>> >> > >> >> >>> >> So, how does relax do the collecting of data for global > fitting? > >> >> >>> >> > >> >> >>> >> Does i collect all the R2eff values for the clustered spins, > and > >> >> >>> >> sent > >> >> >>> >> it > >> >> >>> >> to the target function > >> >> >>> >> together with the array of parameters to vary? > >> >> >>> >> > >> >> >>> >> Or does it calculate per spin, and share the common > parameters? > >> >> >>> >> > >> >> >>> >> My current bottle neck actually seems to be the saving of the > >> >> >>> >> state > >> >> >>> >> file, > >> >> >>> >> between each iteration of global analysis. > >> >> >>> >> > >> >> >>> >> Best > >> >> >>> >> Troels > >> >> >>> >> > >> >> >>> > _______________________________________________ > >> >> >>> > relax (http://www.nmr-relax.com) > >> >> >>> > > >> >> >>> > This is the relax-devel mailing list > >> >> >>> > [email protected] > >> >> >>> > > >> >> >>> > To unsubscribe from this list, get a password > >> >> >>> > reminder, or change your subscription options, > >> >> >>> > visit the list information page at > >> >> >>> > https://mail.gna.org/listinfo/relax-devel > >> >> >> > >> >> >> > >> > > >> > > > > > > _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

