Re: R1rho RD analysis

Edward d'Auvergne Mon, 30 Jun 2014 03:24:19 -0700

Hi,

For a permanent reference, here are the timings of all of the
dispersion models in relax, with statistics from 10 repetitions, of
the calculation time for 100 function calls for 100 spins.  I.e.
either 10,000 function calls for single spins or 100 function calls
for the cluster of 100 spins.  These are for Troels' recent linear
algrebra optimisations in the disp_spin_speed relax branch
(http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/?pathrev=24362).
The absolute timings are not very interesting however the numbers do
allow for comparisons between models.


This shows, for example, how Nikolai's numeric solution (the "NS CPMG
2-site expanded" model at
http://www.nmr-relax.com/api/3.2/lib.dispersion.ns_cpmg_2site_expanded-module.html)
is now only 1.08 or 1.39 times slower than Andy's newly published
analytic model (B14).  It also shows that both of these models are ~10
times faster for single spins or ~100 times faster for clustered spins
than the currently implemented numeric models which use full evolution
matrices.  And how the clustered analysis is now significantly faster
than the non-clustered analysis, though this advantage would be
eliminated when running Gary Thompson's multi-processor framework with
OpenMPI on a large cluster.

After Troels' impressive optimisations it can be seen that there are
almost no optimisation possibilities left.  You can see this in the
code of the target_functions/relax_disp.py file
(http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/target_functions/relax_disp.py?view=log&pathrev=24362)
and the files in lib/dispersion/ implementing each dispersion model
(http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/lib/dispersion/?pathrev=24362).
Apart from the option of expanding the other "NS CPMG 2-site *" and
"NS R1rho *" models following Nikolai's example of the "NS CPMG 2-site
expanded" model, there is not much more speed to be obtained.  As this
is using BLAS/LAPACK+OpenMP via numpy, rewriting the code in C or
FORTRAN will not significantly increase the speed of the models.  If
you feel your model can be faster, any suggestions for speeding up the
code are welcome.  Note that the math domain error checking and edge
case detection code is essential for providing numerical stability for
the dispersion models, and is present for all models.

Here are the timings:

100 single spins analysis:
http://wiki.nmr-relax.com/No_Rex                       0.290+/-0.016 seconds
http://wiki.nmr-relax.com/LM63                         0.792+/-0.015 seconds
http://wiki.nmr-relax.com/LM63_3-site                  1.042+/-0.018 seconds
http://wiki.nmr-relax.com/CR72                         1.588+/-0.038 seconds
http://wiki.nmr-relax.com/CR72_full                    1.723+/-0.030 seconds
http://wiki.nmr-relax.com/IT99                         0.894+/-0.020 seconds
http://wiki.nmr-relax.com/TSMFK01                      0.751+/-0.013 seconds
http://wiki.nmr-relax.com/B14                          3.975+/-0.070 seconds
http://wiki.nmr-relax.com/B14_full                     3.973+/-0.073 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_expanded      4.234+/-0.089 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_3D           45.374+/-1.355 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_3D_full      45.954+/-1.165 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_star         36.792+/-0.765 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_star_full    36.345+/-0.420 seconds
http://wiki.nmr-relax.com/MMQ_CR72                     4.211+/-0.084 seconds
http://wiki.nmr-relax.com/NS_MMQ_2-site               83.476+/-2.326 seconds
http://wiki.nmr-relax.com/NS_MMQ_3-site_linear        91.677+/-1.460 seconds
http://wiki.nmr-relax.com/NS_MMQ_3-site               93.030+/-0.840 seconds
http://wiki.nmr-relax.com/M61                          0.907+/-0.057 seconds
http://wiki.nmr-relax.com/DPL94                        1.128+/-0.022 seconds
http://wiki.nmr-relax.com/TP02                         1.236+/-0.019 seconds
http://wiki.nmr-relax.com/TAP03                        1.952+/-0.028 seconds
http://wiki.nmr-relax.com/MP05                         1.445+/-0.029 seconds
http://wiki.nmr-relax.com/NS_R1rho_2-site             36.547+/-1.757 seconds
http://wiki.nmr-relax.com/NS_R1rho_3-site_linear      72.351+/-1.483 seconds
http://wiki.nmr-relax.com/NS_R1rho_3-site             73.000+/-1.696 seconds

Cluster of 100 spins analysis:
http://wiki.nmr-relax.com/No_Rex                       0.008+/-0.000 seconds
http://wiki.nmr-relax.com/LM63                         0.038+/-0.002 seconds
http://wiki.nmr-relax.com/LM63_3-site                  0.069+/-0.001 seconds
http://wiki.nmr-relax.com/CR72                         0.107+/-0.002 seconds
http://wiki.nmr-relax.com/CR72_full                    0.112+/-0.005 seconds
http://wiki.nmr-relax.com/IT99                         0.009+/-0.000 seconds
http://wiki.nmr-relax.com/TSMFK01                      0.039+/-0.000 seconds
http://wiki.nmr-relax.com/B14                          0.509+/-0.006 seconds
http://wiki.nmr-relax.com/B14_full                     0.488+/-0.006 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_expanded      0.711+/-0.008 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_3D           41.426+/-1.090 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_3D_full      41.686+/-0.632 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_star         31.209+/-0.635 seconds
http://wiki.nmr-relax.com/NS_CPMG_2-site_star_full    30.933+/-0.496 seconds
http://wiki.nmr-relax.com/MMQ_CR72                     0.863+/-0.007 seconds
http://wiki.nmr-relax.com/NS_MMQ_2-site               77.124+/-1.951 seconds
http://wiki.nmr-relax.com/NS_MMQ_3-site_linear        83.364+/-1.075 seconds
http://wiki.nmr-relax.com/NS_MMQ_3-site               83.700+/-0.595 seconds
http://wiki.nmr-relax.com/M61                          0.036+/-0.001 seconds
http://wiki.nmr-relax.com/DPL94                        0.138+/-0.002 seconds
http://wiki.nmr-relax.com/TP02                         0.172+/-0.002 seconds
http://wiki.nmr-relax.com/TAP03                        0.294+/-0.002 seconds
http://wiki.nmr-relax.com/MP05                         0.191+/-0.005 seconds
http://wiki.nmr-relax.com/NS_R1rho_2-site             33.027+/-0.549 seconds
http://wiki.nmr-relax.com/NS_R1rho_3-site_linear      64.863+/-1.602 seconds
http://wiki.nmr-relax.com/NS_R1rho_3-site             66.390+/-2.050 seconds

The full timing file with many more details can be seen at the
permanent link of
http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/test_suite/shared_data/dispersion/profiling/disp_profile_single.log?view=log&pathrev=24364
or 
http://svn.gna.org/viewcvs/*checkout*/relax/branches/disp_spin_speed/test_suite/shared_data/dispersion/profiling/disp_profile_single.log?revision=24364&pathrev=24364.

Regards,

Edward


P. S.  ATLAS can also be set up with numpy for even greater speed
(https://en.wikipedia.org/wiki/Automatically_Tuned_Linear_Algebra_Software).

On 27 June 2014 12:01, Edward d'Auvergne <edw...@nmr-relax.com> wrote:
> Hi Troels,
>
> For reference for the relax users reading this, the abbreviations
> Troels used for the relaxation dispersion models can be decoded using
> the relax wiki page
> http://wiki.nmr-relax.com/Category:Relaxation_dispersion.
>
>> Just a little heads-up.
>>
>> Within a week or two, we should be able to release a new version of relax.
>>
>> This update has focused on speed, recasting the data to higher
>> dimensional numpy arrays, and
>> moving the multiple dot operations out of the for loops.
>>
>> So we have quite much better speed now. :-)
>
> Troels, you just gave away the surprise of the relax 3.2.3 or 3.2.4
> release!  Note that relax was not particularly slow before these
> changes, but that you have taken far greater advantage of the
> BLAS/LAPACK libraries behind the numpy functions to make the
> dispersion models in relax super fast, especially when spins are
> clustered.  It's a pity we don't have the means to compare the
> optimisation target function speed between relax and other software to
> show how much faster relax now is, as the relax advantage of having a
> grid search to find the initial non-biased position for optimisation
> (a very important technique for optimisation that a number of other
> dispersion software do not provide) will give many relax users the
> incorrect impression that relax is slower than the other software.
> Though if relax is used on a cluster with OpenMPI, this is a
> non-issue.
>
>
>> But we still have a bottleneck with numerical models, where doing the
>> matrix exponential via eigenvalue decomposition is slow!
>> Do any of you any experience with a faster method for doing matrix 
>> exponential?
>>
>> These initial results shows that if you are going to use the R1rho
>> 2site model, you can expect:
>>
>> -R1rho
>> 100 single spins analysis:
>> DPL94:                      23.525+/-0.409 ->   1.138+/-0.035,  20.676x 
>> faster.
>> TP02:                       20.191+/-0.375 ->   1.238+/-0.020,  16.308x 
>> faster.
>> TAP03:                      31.993+/-0.235 ->   1.956+/-0.025,  16.353x 
>> faster.
>> MP05:                       24.892+/-0.354 ->   1.431+/-0.014,  17.395x 
>> faster.
>> NS R1rho 2-site:           245.961+/-2.637 ->  36.308+/-0.458,   6.774x 
>> faster.
>>
>> Cluster of 100 spins analysis:
>> DPL94:                      23.872+/-0.505 ->   0.145+/-0.002, 164.180x 
>> faster.
>> TP02:                       20.445+/-0.411 ->   0.172+/-0.004, 118.662x 
>> faster.
>> TAP03:                      32.212+/-0.234 ->   0.294+/-0.002, 109.714x 
>> faster.
>> MP05:                       25.013+/-0.362 ->   0.188+/-0.003, 132.834x 
>> faster.
>> NS R1rho 2-site:           246.024+/-3.724 ->  33.119+/-0.320,   7.428x 
>> faster.
>>
>> -CPMG
>> 100 single spins analysis:
>> CR72:                        2.615+/-0.018 ->   1.614+/-0.024,   1.621x 
>> faster.
>> CR72 full:                   2.678+/-0.020 ->   1.839+/-0.165,   1.456x 
>> faster.
>> IT99:                        1.837+/-0.030 ->   0.881+/-0.010,   2.086x 
>> faster.
>> TSMFK01:                     1.665+/-0.049 ->   0.742+/-0.007,   2.243x 
>> faster.
>> B14:                         5.851+/-0.133 ->   3.978+/-0.049,   1.471x 
>> faster.
>> B14 full:                    5.789+/-0.102 ->   4.065+/-0.059,   1.424x 
>> faster.
>> NS CPMG 2-site expanded:     8.225+/-0.196 ->   4.140+/-0.062,   1.987x 
>> faster.
>> NS CPMG 2-site 3D:         240.027+/-3.182 ->  45.056+/-0.584,   5.327x 
>> faster.
>> NS CPMG 2-site 3D full:    240.910+/-4.882 ->  45.230+/-0.540,   5.326x 
>> faster.
>> NS CPMG 2-site star:       186.480+/-2.299 ->  36.400+/-0.397,   5.123x 
>> faster.
>> NS CPMG 2-site star full:  187.111+/-2.791 ->  36.745+/-0.689,   5.092x 
>> faster.
>>
>> Cluster of 100 spins analysis:
>> CR72:                        2.610+/-0.035 ->   0.118+/-0.001,  22.138x 
>> faster.
>> CR72 full:                   2.674+/-0.021 ->   0.122+/-0.001,  21.882x 
>> faster.
>> IT99:                        0.018+/-0.000 ->   0.009+/-0.000,
>> 2.044x faster. IT99: Cluster of only 1 spin analysis, since v. 3.2.2
>> had a bug with clustering analysis.:
>> TSMFK01:                     1.691+/-0.091 ->   0.039+/-0.000,  43.369x 
>> faster.
>> B14:                         5.891+/-0.127 ->   0.523+/-0.004,  11.264x 
>> faster.
>> B14 full:                    5.818+/-0.127 ->   0.515+/-0.021,  11.295x 
>> faster.
>> NS CPMG 2-site expanded:     8.167+/-0.159 ->   0.702+/-0.008,  11.638x 
>> faster.
>> NS CPMG 2-site 3D:         238.717+/-3.025 ->  41.380+/-0.950,   5.769x 
>> faster.
>> NS CPMG 2-site 3D full:    507.411+/-803.089 ->  41.852+/-1.317,
>> 12.124x faster.
>> NS CPMG 2-site star:       187.004+/-1.935 ->  30.823+/-0.371,   6.067x 
>> faster.
>> NS CPMG 2-site star full:  187.852+/-2.698 ->  30.882+/-0.671,   6.083x 
>> faster.
>
>
> There are a number of ways of computing the matrix exponential.
> Avoiding eigenvalue decomposition is the essential key.  The Pade
> approximation is probably the best, followed by one of the Taylor
> series expansion approximations.  As I mentioned in a different post
> to the relax-devel mailing list, this was used earlier in relax via
> Scipy.  But I removed this and wrote my own algorithm using eigenvalue
> decomposition as the Scipy expm() function used in relax was buggy and
> could not correctly handle the complex part of the matrix (for the 'NS
> CPMG * star' models, http://wiki.nmr-relax.com/NS_CPMG_2-site_star).
>
> Expokit might also be useful http://www.maths.uq.edu.au/expokit/.  But
> if you can come up with a multi-rank version of such an algorithm, as
> well as the matrix power algorithm, taking advantage of BLAS/LAPACK
> (or writing it in C or FORTRAN and using OpenMP
> https://en.wikipedia.org/wiki/OpenMP with a Python wrapper interface),
> then this will be a huge contribution.  Eliminating this bottleneck
> that makes the numeric dispersion models 10 to 500 times slower than
> the analytic models, as can be seen in the table of speed ups, would
> be impressive.
>
> Note that there is another way to solve this problem which is far
> faster, and that is what Nikolai Skrynnikov did with the 'NS CPMG
> 2-site expanded' model in relax
> (http://www.nmr-relax.com/api/3.2/lib.dispersion.ns_cpmg_2site_expanded-module.html
> and http://wiki.nmr-relax.com/NS_CPMG_2-site_expanded).  This is a
> numeric model, but it looks rather analytic and it requires no slow
> matrix exponential or matrix power operations.  From the speeds in
> seconds that you gave, it can be seen that this model is only slightly
> slower than Andy's analytic B14 model (http://wiki.nmr-relax.com/B14).
> The reason why this model is now pretty much the same speed as Andy's
> model are your linear algebra optimisations.  And using Nikolai's
> approach, it is possible to achieve this for all the numeric models in
> relax.
>
> The idea is to take each numeric model and brute-force expand it using
> Maple (https://en.wikipedia.org/wiki/Maple_%28software%29) or
> equivalent.  Other symbolic algebra software such as the free Maxima
> (or the GUI wxMaxima) could be used, or Mathematica, though it looks
> like Maple is the most powerful for outputting this.  You can see how
> this was done in the lib.dispersion.ns_cpmg_2site_expanded module
> documentation, as the original Maple script is reproduced there
> together with an explanation of the steps involved
> (http://www.nmr-relax.com/api/3.2/lib.dispersion.ns_cpmg_2site_expanded-module.html).
> All numeric models, whereby the evolution matrix is slightly different
> to account for different effects, has a completely different rank, or
> is complex, can be expanded like this.  Then the incredibly slow
> matrix exponential and matrix power operations are avoided and all of
> your linear algebra tricks for speeding up the models by taking
> advantage of BLAS/LAPACK+OpenMP via numpy can be used.  Using this
> approach, you should be able to speed up all numeric models to the
> equivalent of Nikolai's 'NS CPMG 2-site expanded' model.  This would
> be the ultimate speed up for the numeric models, and would make relax
> by far the fastest dispersion software out there.  You could then
> include this point as part of your paper, and have all the equations
> in a supplement so it can be cited.
>
> Regards,
>
> Edward

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: R1rho RD analysis

Reply via email to