Re: How is the R2eff data collected and processed for clustered analysis?

Troels Emtekær Linnet Wed, 04 Jun 2014 07:47:43 -0700

Hi Edward.

Ah ja.
I overwrite the state file for each new global fitting, with the new pipe.
So that is increasing quite much.
I will change that.


I just checked my scripts.
In both cases, I would do one grid search for the first run, and then the
recurring analysis would copy the parameters from the first pipe.

And the speed-up is between these analysis.

Hm.
I have to take that variable out with the grid search!

I am trying to device a profile script, which I can put in the base folder
of older versions of relax.
For example relax 3.1.6 which I also have.

It looks like this:
-------------
# Python module imports.
from numpy import array, float64, pi, zeros
import sys
import os
import cProfile

# relax module imports.
from lib.dispersion.cr72 import r2eff_CR72

# Default parameter values.
r20a = 2.0
r20b = 4.0
pA = 0.95
dw = 2.0
kex = 1000.0

relax_times = 0.04
ncyc_list = [2, 4, 8, 10, 20, 40, 500]

# Required data structures.
s_ncyc = array(ncyc_list)
s_num_points = len(s_ncyc)
s_cpmg_frqs = s_ncyc / relax_times
s_R2eff = zeros(s_num_points, float64)

g_ncyc = array(ncyc_list*100)
g_num_points = len(g_ncyc)
g_cpmg_frqs = g_ncyc / relax_times
g_R2eff = zeros(g_num_points, float64)

# The spin Larmor frequencies.
sfrq = 200. * 1E6

# Calculate pB.
pB = 1.0 - pA

# Exchange rates.
k_BA = pA * kex
k_AB = pB * kex

# Calculate spin Larmor frequencies in 2pi.
frqs = sfrq * 2 * pi

# Convert dw from ppm to rad/s.
dw_frq = dw * frqs / 1.e6


def single():
    for i in xrange(0,10000):
        r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, kex=kex,
cpmg_frqs=s_cpmg_frqs, back_calc=s_R2eff, num_points=s_num_points)

cProfile.run('single()')

def cluster():
    for i in xrange(0,10000):
        r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, kex=kex,
cpmg_frqs=g_cpmg_frqs, back_calc=g_R2eff, num_points=g_num_points)

cProfile.run('cluster()')
------------------------

For 3.1.6
[tlinnet@tomat relax-3.1.6]$ python profile_lib_dispersion_cr72.py
         20003 function calls in 0.793 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.793    0.793 <string>:1(<module>)
    10000    0.778    0.000    0.783    0.000 cr72.py:98(r2eff_CR72)
        1    0.010    0.010    0.793    0.793
profile_lib_dispersion_cr72.py:69(single)
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.005    0.000    0.005    0.000 {range}


         20003 function calls in 61.901 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   61.901   61.901 <string>:1(<module>)
    10000   61.853    0.006   61.887    0.006 cr72.py:98(r2eff_CR72)
        1    0.013    0.013   61.901   61.901
profile_lib_dispersion_cr72.py:75(cluster)
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.035    0.000    0.035    0.000 {range}


For trunk

[tlinnet@tomat relax_trunk]$ python profile_lib_dispersion_cr72.py
         80003 function calls in 0.514 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.514    0.514 <string>:1(<module>)
    10000    0.390    0.000    0.503    0.000 cr72.py:100(r2eff_CR72)
    10000    0.008    0.000    0.040    0.000 fromnumeric.py:1314(sum)
    10000    0.007    0.000    0.037    0.000 fromnumeric.py:1708(amax)
    10000    0.006    0.000    0.037    0.000 fromnumeric.py:1769(amin)
        1    0.011    0.011    0.514    0.514
profile_lib_dispersion_cr72.py:69(single)
    10000    0.007    0.000    0.007    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.030    0.000    0.030    0.000 {method 'max' of
'numpy.ndarray' objects}
    10000    0.030    0.000    0.030    0.000 {method 'min' of
'numpy.ndarray' objects}
    10000    0.025    0.000    0.025    0.000 {method 'sum' of
'numpy.ndarray' objects}


         80003 function calls in 1.209 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.209    1.209 <string>:1(<module>)
    10000    1.042    0.000    1.196    0.000 cr72.py:100(r2eff_CR72)
    10000    0.009    0.000    0.049    0.000 fromnumeric.py:1314(sum)
    10000    0.007    0.000    0.052    0.000 fromnumeric.py:1708(amax)
    10000    0.007    0.000    0.052    0.000 fromnumeric.py:1769(amin)
        1    0.014    0.014    1.209    1.209
profile_lib_dispersion_cr72.py:75(cluster)
    10000    0.007    0.000    0.007    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.045    0.000    0.045    0.000 {method 'max' of
'numpy.ndarray' objects}
    10000    0.045    0.000    0.045    0.000 {method 'min' of
'numpy.ndarray' objects}
    10000    0.033    0.000    0.033    0.000 {method 'sum' of
'numpy.ndarray' objects}
---------------

For 10000 iterations

3.1.6
Single: 0.778
100 cluster: 61.853

trunk
Single: 0.390
100 cluster: 1.042

------

For 1000000 iterations
3.1.6
Single: 83.365
100 cluster:  ???? Still running....

trunk
Single: 40.825
100 cluster: 106.339

I am doing something wrong here?

That is such a massive speed up for clustered analysis, that I simply can't
believe it!

Best
Troels







2014-06-04 15:04 GMT+02:00 Edward d'Auvergne <[email protected]>:

> Hi,
>
> Such a huge speed up cannot be from the changes of the 'disp_speed'
> branch alone.  I would expect from that branch a maximum drop from 30
> min to 15 min.  Therefore it must be your grid search changes.  When
> changing, simplifying, or eliminating the grid search, you have to be
> very careful about the introduced bias.  This bias is unavoidable.  It
> needs to be mentioned in the methods of any paper.  The key is to be
> happy that the bias you have introduced will not negatively impact
> your results.  For example if you believe that the grid search
> replacement is reasonably close to the true solution that the
> optimisation will be able to reach the global minimum.  You also have
> to convince the people reading your paper that the introduced bias is
> reasonable.
>
> As for a script to show the speed changes, you could have a look at
> maybe the
> test_suite/shared_data/dispersion/Hansen/relax_results/relax_disp.py
> file.  This performs a full analysis with a large range of dispersion
> models on the truncated data set from Flemming Hansen.  Or
> test_suite/shared_data/dispersion/Hansen/relax_disp.py which uses all
> of Flemming's data.  These could be run before and after the merger of
> the 'disp_speed' branch, maybe with different models and the profile
> flag turned on.  You could then create a text file in the
> test_suite/shared_data/dispersion/Hansen/relax_results/ directory
> called something like 'relax_timings' to permanently record the speed
> ups.  This file can be used in the future for documenting any other
> speed ups as well.
>
> Regards,
>
> Edward
>
>
>
>
> On 4 June 2014 14:37, Troels Emtekær Linnet <[email protected]> wrote:
> > Looking at my old data, I can see that writing out of data between each
> > global fit analysis before took around 30 min.
> >
> > They now take 2-6 mins.
> >
> > I almost can't believe that speed up!
> >
> > Could we devise a devel-script, which we could use to simulate the
> change?
> >
> > Best
> > Troels
> >
> >
> >
> > 2014-06-04 14:24 GMT+02:00 Troels Emtekær Linnet <[email protected]
> >:
> >
> >> Hi Edward.
> >>
> >> After the changes to the lib/dispersion/model.py files, I see massive
> >> speed-up of the computations.
> >>
> >> During 2 days, I performed over 600 global fittings for a 68 residue
> >> protein, where all residues where clustered.I just did it with 1 cpu.
> >>
> >> This is really really impressive.
> >>
> >> I did though also alter how the grid search was performed, pre-setting
> >> some of the values from known values referred to in a paper.
> >> So I can't really say what has cut the time down.
> >>
> >> But looking at the calculations running, the minimisation runs quite
> fast.
> >>
> >> So, how does relax do the collecting of data for global fitting?
> >>
> >> Does i collect all the R2eff values for the clustered spins, and sent it
> >> to the target function
> >> together with the array of parameters to vary?
> >>
> >> Or does it calculate per spin, and share the common parameters?
> >>
> >> My current bottle neck actually seems to be the saving of the state
> file,
> >> between each iteration of global analysis.
> >>
> >> Best
> >> Troels
> >>
> > _______________________________________________
> > relax (http://www.nmr-relax.com)
> >
> > This is the relax-devel mailing list
> > [email protected]
> >
> > To unsubscribe from this list, get a password
> > reminder, or change your subscription options,
> > visit the list information page at
> > https://mail.gna.org/listinfo/relax-devel
>
_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: How is the R2eff data collected and processed for clustered analysis?

Reply via email to