Well, looking at the Scipy source code of ./scipy/linalg/matfuncs.py, expm2() and expm3() are implemented in Python code, interfacing to other scipy functions. And expm() is an alias for the function of the same name in ./scipy/sparse/linalg/matfuncs.py, which is also implemented in Python code. So there is no BLAS/LAPACK magic happening here, except maybe in the functions that these expm*() functions call.
Regards, Edward On 25 June 2014 17:54, Edward d'Auvergne <[email protected]> wrote: > Hi Troels, > > If you are interested in eliminating this 86% bottleneck in the > numeric R1rho dispersion models in the > lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() > function, it might be worth looking at the scipy.linalg.expm(), > scipy.linalg.expm2(), and scipy.linalg.expm3() functions > (http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html#exponential-and-logarithm-functions), > or the BLAS/LAPACK functions these interface to. Note that I > eliminated the use of these functions in the dispersion analysis a > while ago (http://article.gmane.org/gmane.science.nmr.relax.scm/18869/). > And for good reason, there was a nasty Scipy bug in the Pade > approximation of scipy.linalg.expm(). This and the fatal bugs in all > optimisation algorithms in Scipy (the catalyst for me to create minfx, > originally as part of relax, https://gna.org/projects/minfx/), is the > reason why I try to avoid Scipy as much as possible - I just cannot > trust it any more. Scipy is a collection of poorly integrated tools, > often with the original authors completely disappearing, and it is > full of bugs. Anyway, maybe one of these functions would be of use or > would be a good inspiration. Oh, the scipy.linalg.expm() function in > the original code comes from the fitting_main_kex.py script from > Mathilde Lescanne, Paul Schanda, and Dominique Marion. Note that if > the Taylor series approximation of scipy.linalg.expm3() works well for > the numeric dispersion models, which is untested, that might be by far > the fastest solution. > > Regards, > > Edward > > > On 25 June 2014 08:20, Edward d'Auvergne <[email protected]> wrote: >> Wow, you managed to work out how to remove the slow Python looping >> from the numeric models! For the record, this is the speed up for >> this 'NS R1rho 2-site' model using Python 2.7.6 & numpy 1.8.0 on a >> 64-bit Linux system: >> >> """ >> $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 >> /data/relax/branches/disp_spin_speed >> [snip] >> >> New relax version: relax repository checkout r24308 >> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed >> Old relax version: relax repository checkout r24274 >> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed >> >> Execution iteration 1 >> >> $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2 >> 1000 0.016 0.000 3.561 0.004 >> relax_disp.py:1582(func_ns_r1rho_2site) >> 10 0.013 0.001 3.237 0.324 >> relax_disp.py:1582(func_ns_r1rho_2site) >> $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed >> 1000 0.017 0.000 6.245 0.006 >> relax_disp.py:1552(func_ns_r1rho_2site) >> 10 0.013 0.001 5.899 0.590 >> relax_disp.py:1552(func_ns_r1rho_2site) >> [snip] >> >> 100 single spins analysis: >> NS R1rho 2-site: 62.450+/-0.000 -> 35.610+/-0.000, 1.754x >> faster. >> >> Cluster of 100 spins analysis: >> NS R1rho 2-site: 58.990+/-0.000 -> 32.370+/-0.000, 1.822x >> faster. >> """ >> >> >> This is quite significant for these models. With all the changes >> since relax 3.2.2: >> >> """ >> $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 >> /data/relax/tags/3.2.2/ >> [snip] >> >> New relax version: relax repository checkout r24308 >> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed >> Old relax version: relax 3.2.2 >> >> Execution iteration 1 >> >> $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2 >> 1000 0.016 0.000 3.576 0.004 >> relax_disp.py:1582(func_ns_r1rho_2site) >> 10 0.013 0.001 3.247 0.325 >> relax_disp.py:1582(func_ns_r1rho_2site) >> $ python profiling_ns_r1rho_2site.py /data/relax/tags/3.2.2/ >> 1000 0.245 0.000 24.323 0.024 >> relax_disp.py:1681(func_ns_r1rho_2site) >> 10 0.266 0.027 24.439 2.444 >> relax_disp.py:1681(func_ns_r1rho_2site) >> [snip] >> >> 100 single spins analysis: >> NS R1rho 2-site: 243.230+/-0.000 -> 35.760+/-0.000, 6.802x >> faster. >> >> Cluster of 100 spins analysis: >> NS R1rho 2-site: 244.390+/-0.000 -> 32.470+/-0.000, 7.527x >> faster. >> """ >> >> >> Considering how slow these models are compared to the analytic models, >> this is a huge improvement! According to your >> profiling_ns_r1rho_2site.py script, the slow parts are: >> >> """ >> ('chi2 cluster:', 14153134802.76194) >> Wed Jun 25 07:59:18 2014 /tmp/tmp17lqUn >> >> 211077 function calls in 3.510 seconds >> >> Ordered by: cumulative time >> >> ncalls tottime percall cumtime percall filename:lineno(function) >> 1 0.000 0.000 3.510 3.510 <string>:1(<module>) >> 1 0.002 0.002 3.510 3.510 >> profiling_ns_r1rho_2site.py:553(cluster) >> 10 0.000 0.000 3.254 0.325 >> profiling_ns_r1rho_2site.py:515(calc) >> 10 0.015 0.002 3.254 0.325 >> relax_disp.py:1582(func_ns_r1rho_2site) >> 10 0.015 0.002 3.236 0.324 >> ns_r1rho_2site.py:190(ns_r1rho_2site) >> 10 0.104 0.010 3.041 0.304 >> matrix_exponential.py:33(matrix_exponential_rank_NE_NS_NM_NO_ND_x_x) >> 10 1.803 0.180 1.827 0.183 linalg.py:982(eig) >> 10 0.564 0.056 0.601 0.060 linalg.py:455(inv) >> 40 0.501 0.013 0.501 0.013 {numpy.core.multiarray.einsum} >> """ >> >> >> The single spin numbers are very similar. Looking at the total times, >> the numpy.einsum and numpy.inv functions are taking 1/7 of the time >> each, and numpy.eig 51% of the time - this is a total of 80% of the >> time. The >> lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() >> function now takes up 86% of the cumulative time - so this is the new >> bottleneck. The only way to solve this would be using clever linear >> algebra tricks to simplify or to change the logic in >> matrix_exponential_rank_NE_NS_NM_NO_ND_x_x(). There are multiple >> techniques for finding the matrix exponential, it could be that the >> eigenvalue method is one of the slowest. This looks to be interesting >> for an overview of recent research in the mathematics field on this >> subject: http://www.maths.manchester.ac.uk/~higham/talks/exp09.pdf. >> Great work though, you have eliminated the worst bottleneck for the >> numeric models! >> >> Cheers, >> >> Edward >> >> >> >> On 25 June 2014 02:14, <[email protected]> wrote: >>> Author: tlinnet >>> Date: Wed Jun 25 02:14:36 2014 >>> New Revision: 24294 >>> >>> URL: http://svn.gna.org/viewcvs/relax?rev=24294&view=rev >>> Log: >>> Speeded up the code of NS r1rho 2site. >>> >>> This was essential done to numpy einsum, and doing the dot operations in >>> multiple dimensions. >>> It was though necessary to realize, that to do the proper dot product >>> operations, the outer two >>> axis if M0 should be swapped, by rolling the outer axis one back. >>> >>> Task #7807 (https://gna.org/task/index.php?7807): Speed-up of dispersion >>> models for Clustered analysis. >>> >>> Modified: >>> branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py >>> branches/disp_spin_speed/target_functions/relax_disp.py >>> >>> Modified: branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py >>> URL: >>> http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py?rev=24294&r1=24293&r2=24294&view=diff >>> ============================================================================== >>> --- branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py (original) >>> +++ branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py Wed Jun 25 >>> 02:14:36 2014 >>> @@ -50,10 +50,10 @@ >>> """ >>> >>> # Python module imports. >>> -from numpy import array, cos, dot, einsum, float64, log, multiply, >>> rollaxis, sin, sum >>> +from numpy import array, einsum, float64, isfinite, log, min, multiply, >>> rollaxis, sin, sum >>> +from numpy.ma import fix_invalid, masked_less >>> >>> # relax module imports. >>> -from lib.float import isNaN >>> from lib.dispersion.matrix_exponential import >>> matrix_exponential_rank_NE_NS_NM_NO_ND_x_x >>> >>> # Repetitive calculations (to speed up calculations). >>> @@ -187,14 +187,16 @@ >>> return matrix >>> >>> >>> -def ns_r1rho_2site(M0=None, r1rho_prime=None, omega=None, offset=None, >>> r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, relax_time=None, >>> inv_relax_time=None, back_calc=None, num_points=None): >>> +def ns_r1rho_2site(M0=None, M0_T=None, r1rho_prime=None, omega=None, >>> offset=None, r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, >>> relax_time=None, inv_relax_time=None, back_calc=None, num_points=None): >>> """The 2-site numerical solution to the Bloch-McConnell equation for >>> R1rho data. >>> >>> This function calculates and stores the R1rho values. >>> >>> >>> @keyword M0: This is a vector that contains the initial >>> magnetizations corresponding to the A and B state transverse magnetizations. >>> - @type M0: numpy float64, rank-1, 7D array >>> + @type M0: numpy float array of rank >>> [NE][NS][NM][NO][ND][6][1] >>> + @keyword M0_T: This is a vector that contains the initial >>> magnetizations corresponding to the A and B state transverse >>> magnetizations, where the outer two axis has been swapped for efficient dot >>> operations. >>> + @type M0_T: numpy float array of rank >>> [NE][NS][NM][NO][ND][1][6] >>> @keyword r1rho_prime: The R1rho_prime parameter value (R1rho >>> with no exchange). >>> @type r1rho_prime: numpy float array of rank >>> [NE][NS][NM][NO][ND] >>> @keyword omega: The chemical shift for the spin in rad/s. >>> @@ -238,43 +240,22 @@ >>> # Magnetization evolution. >>> Rexpo_M0_mat = einsum('...ij,...jk', Rexpo_mat, M0) >>> >>> - # Transpose M0, to prepare for dot operation. Roll the last axis one >>> back. >>> - M0_T = rollaxis(M0, 6, 5) >>> - >>> - if NS == 1: >>> - # Loop over the spectrometer frequencies. >>> - for mi in range(NM): >>> - # Loop over offsets: >>> - for oi in range(NO): >>> - # Extract number of points. >>> - num_points_i = num_points[0, 0, mi, oi] >>> - >>> - # Loop over the time points, back calculating the R2eff >>> values. >>> - for j in range(num_points_i): >>> - >>> - # If the number of spins are 1, do the fastest method >>> by dot product. Else do it by einstein summation. >>> - # Set the spin index for one spin. >>> - si = 0 >>> - # Extract the preformed matrix that rotate the >>> magnetization previous to spin-lock into the weff frame. >>> - M0_i= M0_T[0, si, mi, oi, j] >>> - >>> - # Extract from the pre-formed Magnetization evolution >>> matrix. >>> - Rexpo_M0_mat_i = Rexpo_M0_mat[0, si, mi, oi, j] >>> - >>> - # Magnetization evolution. >>> - MA = dot(M0_i, Rexpo_M0_mat_i)[0, 0] >>> - >>> - # The next lines calculate the R1rho using a two-point >>> approximation, i.e. assuming that the decay is mono-exponential. >>> - if MA <= 0.0 or isNaN(MA): >>> - back_calc[0, si, mi, oi, j] = 1e99 >>> - else: >>> - back_calc[0, si, mi, oi, j]= -inv_relax_time[0, >>> si, mi, oi, j] * log(MA) >>> - >>> - else: >>> - # Magnetization evolution, which include all dimensions. >>> - MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, :, >>> 0, 0] >>> - >>> - # Do back calculation. >>> - back_calc[:] = -inv_relax_time * log(MA_mat) >>> - >>> - >>> + # Magnetization evolution, which include all dimensions. >>> + MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, :, 0, 0] >>> + >>> + # Insert safe checks. >>> + if min(MA_mat) < 0.0: >>> + mask_min_MA_mat = masked_less(MA_mat, 0.0) >>> + # Fill with high values. >>> + MA_mat[mask_min_MA_mat.mask] = 1e100 >>> + >>> + # Do back calculation. >>> + back_calc[:] = -inv_relax_time * log(MA_mat) >>> + >>> + # Catch errors, taking a sum over array is the fastest way to check for >>> + # +/- inf (infinity) and nan (not a number). >>> + if not isfinite(sum(back_calc)): >>> + # Replaces nan, inf, etc. with fill value. >>> + fix_invalid(back_calc, copy=False, fill_value=1e100) >>> + >>> + >>> >>> Modified: branches/disp_spin_speed/target_functions/relax_disp.py >>> URL: >>> http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/target_functions/relax_disp.py?rev=24294&r1=24293&r2=24294&view=diff >>> ============================================================================== >>> --- branches/disp_spin_speed/target_functions/relax_disp.py (original) >>> +++ branches/disp_spin_speed/target_functions/relax_disp.py Wed Jun 25 >>> 02:14:36 2014 >>> @@ -26,8 +26,7 @@ >>> >>> # Python module imports. >>> from copy import deepcopy >>> -from math import pi >>> -from numpy import add, array, arctan2, asarray, cos, complex64, dot, >>> float64, int16, max, multiply, ones, sin, sqrt, sum, tile, zeros >>> +from numpy import arctan2, cos, dot, float64, int16, multiply, ones, >>> rollaxis, pi, sin, sum, zeros >>> from numpy.ma import masked_equal >>> >>> # relax module imports. >>> @@ -415,6 +414,9 @@ >>> M0_2[2, 0] = 1 >>> M0_cos = multiply.outer( cos(theta_mat), M0_2 ) >>> self.M0 = M0_sin + M0_cos >>> + # Transpose M0, to prepare for dot operation. Roll the last >>> axis one back, corresponds to a transpose for the outer two axis. >>> + self.M0_T = rollaxis(self.M0, 6, 5) >>> + >>> if model in [MODEL_NS_R1RHO_3SITE, MODEL_NS_R1RHO_3SITE_LINEAR]: >>> self.M0 = zeros(9, float64) >>> # Offset of spin-lock from A. >>> @@ -1598,7 +1600,7 @@ >>> self.r20_struct[:] = multiply.outer( r1rho_prime.reshape(self.NE, >>> self.NS, self.NM), self.no_nd_ones ) >>> >>> # Back calculate the R2eff values. >>> - ns_r1rho_2site(M0=self.M0, r1rho_prime=self.r20_struct, >>> omega=self.chemical_shifts, offset=self.offset, r1=self.r1, pA=pA, >>> dw=self.dw_struct, kex=kex, spin_lock_fields=self.spin_lock_omega1, >>> relax_time=self.relax_times, inv_relax_time=self.inv_relax_times, >>> back_calc=self.back_calc, num_points=self.num_disp_points) >>> + ns_r1rho_2site(M0=self.M0, M0_T=self.M0_T, >>> r1rho_prime=self.r20_struct, omega=self.chemical_shifts, >>> offset=self.offset, r1=self.r1, pA=pA, dw=self.dw_struct, kex=kex, >>> spin_lock_fields=self.spin_lock_omega1, relax_time=self.relax_times, >>> inv_relax_time=self.inv_relax_times, back_calc=self.back_calc, >>> num_points=self.num_disp_points) >>> >>> # Clean the data for all values, which is left over at the end of >>> arrays. >>> self.back_calc = self.back_calc*self.disp_struct >>> >>> >>> _______________________________________________ >>> relax (http://www.nmr-relax.com) >>> >>> This is the relax-commits mailing list >>> [email protected] >>> >>> To unsubscribe from this list, get a password >>> reminder, or change your subscription options, >>> visit the list information page at >>> https://mail.gna.org/listinfo/relax-commits _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

