Hi Troels, If you are interested in eliminating this 86% bottleneck in the numeric R1rho dispersion models in the lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() function, it might be worth looking at the scipy.linalg.expm(), scipy.linalg.expm2(), and scipy.linalg.expm3() functions (http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html#exponential-and-logarithm-functions), or the BLAS/LAPACK functions these interface to. Note that I eliminated the use of these functions in the dispersion analysis a while ago (http://article.gmane.org/gmane.science.nmr.relax.scm/18869/). And for good reason, there was a nasty Scipy bug in the Pade approximation of scipy.linalg.expm(). This and the fatal bugs in all optimisation algorithms in Scipy (the catalyst for me to create minfx, originally as part of relax, https://gna.org/projects/minfx/), is the reason why I try to avoid Scipy as much as possible - I just cannot trust it any more. Scipy is a collection of poorly integrated tools, often with the original authors completely disappearing, and it is full of bugs. Anyway, maybe one of these functions would be of use or would be a good inspiration. Oh, the scipy.linalg.expm() function in the original code comes from the fitting_main_kex.py script from Mathilde Lescanne, Paul Schanda, and Dominique Marion. Note that if the Taylor series approximation of scipy.linalg.expm3() works well for the numeric dispersion models, which is untested, that might be by far the fastest solution.
Regards, Edward On 25 June 2014 08:20, Edward d'Auvergne <[email protected]> wrote: > Wow, you managed to work out how to remove the slow Python looping > from the numeric models! For the record, this is the speed up for > this 'NS R1rho 2-site' model using Python 2.7.6 & numpy 1.8.0 on a > 64-bit Linux system: > > """ > $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 > /data/relax/branches/disp_spin_speed > [snip] > > New relax version: relax repository checkout r24308 > svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed > Old relax version: relax repository checkout r24274 > svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed > > Execution iteration 1 > > $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2 > 1000 0.016 0.000 3.561 0.004 > relax_disp.py:1582(func_ns_r1rho_2site) > 10 0.013 0.001 3.237 0.324 > relax_disp.py:1582(func_ns_r1rho_2site) > $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed > 1000 0.017 0.000 6.245 0.006 > relax_disp.py:1552(func_ns_r1rho_2site) > 10 0.013 0.001 5.899 0.590 > relax_disp.py:1552(func_ns_r1rho_2site) > [snip] > > 100 single spins analysis: > NS R1rho 2-site: 62.450+/-0.000 -> 35.610+/-0.000, 1.754x > faster. > > Cluster of 100 spins analysis: > NS R1rho 2-site: 58.990+/-0.000 -> 32.370+/-0.000, 1.822x > faster. > """ > > > This is quite significant for these models. With all the changes > since relax 3.2.2: > > """ > $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 > /data/relax/tags/3.2.2/ > [snip] > > New relax version: relax repository checkout r24308 > svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed > Old relax version: relax 3.2.2 > > Execution iteration 1 > > $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2 > 1000 0.016 0.000 3.576 0.004 > relax_disp.py:1582(func_ns_r1rho_2site) > 10 0.013 0.001 3.247 0.325 > relax_disp.py:1582(func_ns_r1rho_2site) > $ python profiling_ns_r1rho_2site.py /data/relax/tags/3.2.2/ > 1000 0.245 0.000 24.323 0.024 > relax_disp.py:1681(func_ns_r1rho_2site) > 10 0.266 0.027 24.439 2.444 > relax_disp.py:1681(func_ns_r1rho_2site) > [snip] > > 100 single spins analysis: > NS R1rho 2-site: 243.230+/-0.000 -> 35.760+/-0.000, 6.802x > faster. > > Cluster of 100 spins analysis: > NS R1rho 2-site: 244.390+/-0.000 -> 32.470+/-0.000, 7.527x > faster. > """ > > > Considering how slow these models are compared to the analytic models, > this is a huge improvement! According to your > profiling_ns_r1rho_2site.py script, the slow parts are: > > """ > ('chi2 cluster:', 14153134802.76194) > Wed Jun 25 07:59:18 2014 /tmp/tmp17lqUn > > 211077 function calls in 3.510 seconds > > Ordered by: cumulative time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 0.000 0.000 3.510 3.510 <string>:1(<module>) > 1 0.002 0.002 3.510 3.510 > profiling_ns_r1rho_2site.py:553(cluster) > 10 0.000 0.000 3.254 0.325 > profiling_ns_r1rho_2site.py:515(calc) > 10 0.015 0.002 3.254 0.325 > relax_disp.py:1582(func_ns_r1rho_2site) > 10 0.015 0.002 3.236 0.324 > ns_r1rho_2site.py:190(ns_r1rho_2site) > 10 0.104 0.010 3.041 0.304 > matrix_exponential.py:33(matrix_exponential_rank_NE_NS_NM_NO_ND_x_x) > 10 1.803 0.180 1.827 0.183 linalg.py:982(eig) > 10 0.564 0.056 0.601 0.060 linalg.py:455(inv) > 40 0.501 0.013 0.501 0.013 {numpy.core.multiarray.einsum} > """ > > > The single spin numbers are very similar. Looking at the total times, > the numpy.einsum and numpy.inv functions are taking 1/7 of the time > each, and numpy.eig 51% of the time - this is a total of 80% of the > time. The > lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() > function now takes up 86% of the cumulative time - so this is the new > bottleneck. The only way to solve this would be using clever linear > algebra tricks to simplify or to change the logic in > matrix_exponential_rank_NE_NS_NM_NO_ND_x_x(). There are multiple > techniques for finding the matrix exponential, it could be that the > eigenvalue method is one of the slowest. This looks to be interesting > for an overview of recent research in the mathematics field on this > subject: http://www.maths.manchester.ac.uk/~higham/talks/exp09.pdf. > Great work though, you have eliminated the worst bottleneck for the > numeric models! > > Cheers, > > Edward > > > > On 25 June 2014 02:14, <[email protected]> wrote: >> Author: tlinnet >> Date: Wed Jun 25 02:14:36 2014 >> New Revision: 24294 >> >> URL: http://svn.gna.org/viewcvs/relax?rev=24294&view=rev >> Log: >> Speeded up the code of NS r1rho 2site. >> >> This was essential done to numpy einsum, and doing the dot operations in >> multiple dimensions. >> It was though necessary to realize, that to do the proper dot product >> operations, the outer two >> axis if M0 should be swapped, by rolling the outer axis one back. >> >> Task #7807 (https://gna.org/task/index.php?7807): Speed-up of dispersion >> models for Clustered analysis. >> >> Modified: >> branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py >> branches/disp_spin_speed/target_functions/relax_disp.py >> >> Modified: branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py >> URL: >> http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py?rev=24294&r1=24293&r2=24294&view=diff >> ============================================================================== >> --- branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py (original) >> +++ branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py Wed Jun 25 >> 02:14:36 2014 >> @@ -50,10 +50,10 @@ >> """ >> >> # Python module imports. >> -from numpy import array, cos, dot, einsum, float64, log, multiply, >> rollaxis, sin, sum >> +from numpy import array, einsum, float64, isfinite, log, min, multiply, >> rollaxis, sin, sum >> +from numpy.ma import fix_invalid, masked_less >> >> # relax module imports. >> -from lib.float import isNaN >> from lib.dispersion.matrix_exponential import >> matrix_exponential_rank_NE_NS_NM_NO_ND_x_x >> >> # Repetitive calculations (to speed up calculations). >> @@ -187,14 +187,16 @@ >> return matrix >> >> >> -def ns_r1rho_2site(M0=None, r1rho_prime=None, omega=None, offset=None, >> r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, relax_time=None, >> inv_relax_time=None, back_calc=None, num_points=None): >> +def ns_r1rho_2site(M0=None, M0_T=None, r1rho_prime=None, omega=None, >> offset=None, r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, >> relax_time=None, inv_relax_time=None, back_calc=None, num_points=None): >> """The 2-site numerical solution to the Bloch-McConnell equation for >> R1rho data. >> >> This function calculates and stores the R1rho values. >> >> >> @keyword M0: This is a vector that contains the initial >> magnetizations corresponding to the A and B state transverse magnetizations. >> - @type M0: numpy float64, rank-1, 7D array >> + @type M0: numpy float array of rank >> [NE][NS][NM][NO][ND][6][1] >> + @keyword M0_T: This is a vector that contains the initial >> magnetizations corresponding to the A and B state transverse magnetizations, >> where the outer two axis has been swapped for efficient dot operations. >> + @type M0_T: numpy float array of rank >> [NE][NS][NM][NO][ND][1][6] >> @keyword r1rho_prime: The R1rho_prime parameter value (R1rho with >> no exchange). >> @type r1rho_prime: numpy float array of rank >> [NE][NS][NM][NO][ND] >> @keyword omega: The chemical shift for the spin in rad/s. >> @@ -238,43 +240,22 @@ >> # Magnetization evolution. >> Rexpo_M0_mat = einsum('...ij,...jk', Rexpo_mat, M0) >> >> - # Transpose M0, to prepare for dot operation. Roll the last axis one >> back. >> - M0_T = rollaxis(M0, 6, 5) >> - >> - if NS == 1: >> - # Loop over the spectrometer frequencies. >> - for mi in range(NM): >> - # Loop over offsets: >> - for oi in range(NO): >> - # Extract number of points. >> - num_points_i = num_points[0, 0, mi, oi] >> - >> - # Loop over the time points, back calculating the R2eff >> values. >> - for j in range(num_points_i): >> - >> - # If the number of spins are 1, do the fastest method >> by dot product. Else do it by einstein summation. >> - # Set the spin index for one spin. >> - si = 0 >> - # Extract the preformed matrix that rotate the >> magnetization previous to spin-lock into the weff frame. >> - M0_i= M0_T[0, si, mi, oi, j] >> - >> - # Extract from the pre-formed Magnetization evolution >> matrix. >> - Rexpo_M0_mat_i = Rexpo_M0_mat[0, si, mi, oi, j] >> - >> - # Magnetization evolution. >> - MA = dot(M0_i, Rexpo_M0_mat_i)[0, 0] >> - >> - # The next lines calculate the R1rho using a two-point >> approximation, i.e. assuming that the decay is mono-exponential. >> - if MA <= 0.0 or isNaN(MA): >> - back_calc[0, si, mi, oi, j] = 1e99 >> - else: >> - back_calc[0, si, mi, oi, j]= -inv_relax_time[0, si, >> mi, oi, j] * log(MA) >> - >> - else: >> - # Magnetization evolution, which include all dimensions. >> - MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, :, >> 0, 0] >> - >> - # Do back calculation. >> - back_calc[:] = -inv_relax_time * log(MA_mat) >> - >> - >> + # Magnetization evolution, which include all dimensions. >> + MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, :, 0, 0] >> + >> + # Insert safe checks. >> + if min(MA_mat) < 0.0: >> + mask_min_MA_mat = masked_less(MA_mat, 0.0) >> + # Fill with high values. >> + MA_mat[mask_min_MA_mat.mask] = 1e100 >> + >> + # Do back calculation. >> + back_calc[:] = -inv_relax_time * log(MA_mat) >> + >> + # Catch errors, taking a sum over array is the fastest way to check for >> + # +/- inf (infinity) and nan (not a number). >> + if not isfinite(sum(back_calc)): >> + # Replaces nan, inf, etc. with fill value. >> + fix_invalid(back_calc, copy=False, fill_value=1e100) >> + >> + >> >> Modified: branches/disp_spin_speed/target_functions/relax_disp.py >> URL: >> http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/target_functions/relax_disp.py?rev=24294&r1=24293&r2=24294&view=diff >> ============================================================================== >> --- branches/disp_spin_speed/target_functions/relax_disp.py (original) >> +++ branches/disp_spin_speed/target_functions/relax_disp.py Wed Jun 25 >> 02:14:36 2014 >> @@ -26,8 +26,7 @@ >> >> # Python module imports. >> from copy import deepcopy >> -from math import pi >> -from numpy import add, array, arctan2, asarray, cos, complex64, dot, >> float64, int16, max, multiply, ones, sin, sqrt, sum, tile, zeros >> +from numpy import arctan2, cos, dot, float64, int16, multiply, ones, >> rollaxis, pi, sin, sum, zeros >> from numpy.ma import masked_equal >> >> # relax module imports. >> @@ -415,6 +414,9 @@ >> M0_2[2, 0] = 1 >> M0_cos = multiply.outer( cos(theta_mat), M0_2 ) >> self.M0 = M0_sin + M0_cos >> + # Transpose M0, to prepare for dot operation. Roll the last >> axis one back, corresponds to a transpose for the outer two axis. >> + self.M0_T = rollaxis(self.M0, 6, 5) >> + >> if model in [MODEL_NS_R1RHO_3SITE, MODEL_NS_R1RHO_3SITE_LINEAR]: >> self.M0 = zeros(9, float64) >> # Offset of spin-lock from A. >> @@ -1598,7 +1600,7 @@ >> self.r20_struct[:] = multiply.outer( r1rho_prime.reshape(self.NE, >> self.NS, self.NM), self.no_nd_ones ) >> >> # Back calculate the R2eff values. >> - ns_r1rho_2site(M0=self.M0, r1rho_prime=self.r20_struct, >> omega=self.chemical_shifts, offset=self.offset, r1=self.r1, pA=pA, >> dw=self.dw_struct, kex=kex, spin_lock_fields=self.spin_lock_omega1, >> relax_time=self.relax_times, inv_relax_time=self.inv_relax_times, >> back_calc=self.back_calc, num_points=self.num_disp_points) >> + ns_r1rho_2site(M0=self.M0, M0_T=self.M0_T, >> r1rho_prime=self.r20_struct, omega=self.chemical_shifts, offset=self.offset, >> r1=self.r1, pA=pA, dw=self.dw_struct, kex=kex, >> spin_lock_fields=self.spin_lock_omega1, relax_time=self.relax_times, >> inv_relax_time=self.inv_relax_times, back_calc=self.back_calc, >> num_points=self.num_disp_points) >> >> # Clean the data for all values, which is left over at the end of >> arrays. >> self.back_calc = self.back_calc*self.disp_struct >> >> >> _______________________________________________ >> relax (http://www.nmr-relax.com) >> >> This is the relax-commits mailing list >> [email protected] >> >> To unsubscribe from this list, get a password >> reminder, or change your subscription options, >> visit the list information page at >> https://mail.gna.org/listinfo/relax-commits _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

