For this problem, the following reference could be interesting for summarising the different techniques:
Moler, C. and Van Loan, C. (2003) Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later. SIAM Review, 45, 3-49. (http://dx.doi.org/10.1137/S00361445024180 or http://www.cs.cornell.edu/cv/researchpdf/19ways+.pdf). I'm putting the link here for future reference as well. Regards, Edward On 25 June 2014 18:12, Edward d'Auvergne <[email protected]> wrote: > Well, looking at the Scipy source code of ./scipy/linalg/matfuncs.py, > expm2() and expm3() are implemented in Python code, interfacing to > other scipy functions. And expm() is an alias for the function of the > same name in ./scipy/sparse/linalg/matfuncs.py, which is also > implemented in Python code. So there is no BLAS/LAPACK magic > happening here, except maybe in the functions that these expm*() > functions call. > > Regards, > > Edward > > > > On 25 June 2014 17:54, Edward d'Auvergne <[email protected]> wrote: >> Hi Troels, >> >> If you are interested in eliminating this 86% bottleneck in the >> numeric R1rho dispersion models in the >> lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() >> function, it might be worth looking at the scipy.linalg.expm(), >> scipy.linalg.expm2(), and scipy.linalg.expm3() functions >> (http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html#exponential-and-logarithm-functions), >> or the BLAS/LAPACK functions these interface to. Note that I >> eliminated the use of these functions in the dispersion analysis a >> while ago (http://article.gmane.org/gmane.science.nmr.relax.scm/18869/). >> And for good reason, there was a nasty Scipy bug in the Pade >> approximation of scipy.linalg.expm(). This and the fatal bugs in all >> optimisation algorithms in Scipy (the catalyst for me to create minfx, >> originally as part of relax, https://gna.org/projects/minfx/), is the >> reason why I try to avoid Scipy as much as possible - I just cannot >> trust it any more. Scipy is a collection of poorly integrated tools, >> often with the original authors completely disappearing, and it is >> full of bugs. Anyway, maybe one of these functions would be of use or >> would be a good inspiration. Oh, the scipy.linalg.expm() function in >> the original code comes from the fitting_main_kex.py script from >> Mathilde Lescanne, Paul Schanda, and Dominique Marion. Note that if >> the Taylor series approximation of scipy.linalg.expm3() works well for >> the numeric dispersion models, which is untested, that might be by far >> the fastest solution. >> >> Regards, >> >> Edward >> >> >> On 25 June 2014 08:20, Edward d'Auvergne <[email protected]> wrote: >>> Wow, you managed to work out how to remove the slow Python looping >>> from the numeric models! For the record, this is the speed up for >>> this 'NS R1rho 2-site' model using Python 2.7.6 & numpy 1.8.0 on a >>> 64-bit Linux system: >>> >>> """ >>> $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 >>> /data/relax/branches/disp_spin_speed >>> [snip] >>> >>> New relax version: relax repository checkout r24308 >>> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed >>> Old relax version: relax repository checkout r24274 >>> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed >>> >>> Execution iteration 1 >>> >>> $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2 >>> 1000 0.016 0.000 3.561 0.004 >>> relax_disp.py:1582(func_ns_r1rho_2site) >>> 10 0.013 0.001 3.237 0.324 >>> relax_disp.py:1582(func_ns_r1rho_2site) >>> $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed >>> 1000 0.017 0.000 6.245 0.006 >>> relax_disp.py:1552(func_ns_r1rho_2site) >>> 10 0.013 0.001 5.899 0.590 >>> relax_disp.py:1552(func_ns_r1rho_2site) >>> [snip] >>> >>> 100 single spins analysis: >>> NS R1rho 2-site: 62.450+/-0.000 -> 35.610+/-0.000, 1.754x >>> faster. >>> >>> Cluster of 100 spins analysis: >>> NS R1rho 2-site: 58.990+/-0.000 -> 32.370+/-0.000, 1.822x >>> faster. >>> """ >>> >>> >>> This is quite significant for these models. With all the changes >>> since relax 3.2.2: >>> >>> """ >>> $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 >>> /data/relax/tags/3.2.2/ >>> [snip] >>> >>> New relax version: relax repository checkout r24308 >>> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed >>> Old relax version: relax 3.2.2 >>> >>> Execution iteration 1 >>> >>> $ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2 >>> 1000 0.016 0.000 3.576 0.004 >>> relax_disp.py:1582(func_ns_r1rho_2site) >>> 10 0.013 0.001 3.247 0.325 >>> relax_disp.py:1582(func_ns_r1rho_2site) >>> $ python profiling_ns_r1rho_2site.py /data/relax/tags/3.2.2/ >>> 1000 0.245 0.000 24.323 0.024 >>> relax_disp.py:1681(func_ns_r1rho_2site) >>> 10 0.266 0.027 24.439 2.444 >>> relax_disp.py:1681(func_ns_r1rho_2site) >>> [snip] >>> >>> 100 single spins analysis: >>> NS R1rho 2-site: 243.230+/-0.000 -> 35.760+/-0.000, 6.802x >>> faster. >>> >>> Cluster of 100 spins analysis: >>> NS R1rho 2-site: 244.390+/-0.000 -> 32.470+/-0.000, 7.527x >>> faster. >>> """ >>> >>> >>> Considering how slow these models are compared to the analytic models, >>> this is a huge improvement! According to your >>> profiling_ns_r1rho_2site.py script, the slow parts are: >>> >>> """ >>> ('chi2 cluster:', 14153134802.76194) >>> Wed Jun 25 07:59:18 2014 /tmp/tmp17lqUn >>> >>> 211077 function calls in 3.510 seconds >>> >>> Ordered by: cumulative time >>> >>> ncalls tottime percall cumtime percall filename:lineno(function) >>> 1 0.000 0.000 3.510 3.510 <string>:1(<module>) >>> 1 0.002 0.002 3.510 3.510 >>> profiling_ns_r1rho_2site.py:553(cluster) >>> 10 0.000 0.000 3.254 0.325 >>> profiling_ns_r1rho_2site.py:515(calc) >>> 10 0.015 0.002 3.254 0.325 >>> relax_disp.py:1582(func_ns_r1rho_2site) >>> 10 0.015 0.002 3.236 0.324 >>> ns_r1rho_2site.py:190(ns_r1rho_2site) >>> 10 0.104 0.010 3.041 0.304 >>> matrix_exponential.py:33(matrix_exponential_rank_NE_NS_NM_NO_ND_x_x) >>> 10 1.803 0.180 1.827 0.183 linalg.py:982(eig) >>> 10 0.564 0.056 0.601 0.060 linalg.py:455(inv) >>> 40 0.501 0.013 0.501 0.013 {numpy.core.multiarray.einsum} >>> """ >>> >>> >>> The single spin numbers are very similar. Looking at the total times, >>> the numpy.einsum and numpy.inv functions are taking 1/7 of the time >>> each, and numpy.eig 51% of the time - this is a total of 80% of the >>> time. The >>> lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() >>> function now takes up 86% of the cumulative time - so this is the new >>> bottleneck. The only way to solve this would be using clever linear >>> algebra tricks to simplify or to change the logic in >>> matrix_exponential_rank_NE_NS_NM_NO_ND_x_x(). There are multiple >>> techniques for finding the matrix exponential, it could be that the >>> eigenvalue method is one of the slowest. This looks to be interesting >>> for an overview of recent research in the mathematics field on this >>> subject: http://www.maths.manchester.ac.uk/~higham/talks/exp09.pdf. >>> Great work though, you have eliminated the worst bottleneck for the >>> numeric models! >>> >>> Cheers, >>> >>> Edward >>> >>> >>> >>> On 25 June 2014 02:14, <[email protected]> wrote: >>>> Author: tlinnet >>>> Date: Wed Jun 25 02:14:36 2014 >>>> New Revision: 24294 >>>> >>>> URL: http://svn.gna.org/viewcvs/relax?rev=24294&view=rev >>>> Log: >>>> Speeded up the code of NS r1rho 2site. >>>> >>>> This was essential done to numpy einsum, and doing the dot operations in >>>> multiple dimensions. >>>> It was though necessary to realize, that to do the proper dot product >>>> operations, the outer two >>>> axis if M0 should be swapped, by rolling the outer axis one back. >>>> >>>> Task #7807 (https://gna.org/task/index.php?7807): Speed-up of dispersion >>>> models for Clustered analysis. >>>> >>>> Modified: >>>> branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py >>>> branches/disp_spin_speed/target_functions/relax_disp.py >>>> >>>> Modified: branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py >>>> URL: >>>> http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py?rev=24294&r1=24293&r2=24294&view=diff >>>> ============================================================================== >>>> --- branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py (original) >>>> +++ branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py Wed Jun 25 >>>> 02:14:36 2014 >>>> @@ -50,10 +50,10 @@ >>>> """ >>>> >>>> # Python module imports. >>>> -from numpy import array, cos, dot, einsum, float64, log, multiply, >>>> rollaxis, sin, sum >>>> +from numpy import array, einsum, float64, isfinite, log, min, multiply, >>>> rollaxis, sin, sum >>>> +from numpy.ma import fix_invalid, masked_less >>>> >>>> # relax module imports. >>>> -from lib.float import isNaN >>>> from lib.dispersion.matrix_exponential import >>>> matrix_exponential_rank_NE_NS_NM_NO_ND_x_x >>>> >>>> # Repetitive calculations (to speed up calculations). >>>> @@ -187,14 +187,16 @@ >>>> return matrix >>>> >>>> >>>> -def ns_r1rho_2site(M0=None, r1rho_prime=None, omega=None, offset=None, >>>> r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, >>>> relax_time=None, inv_relax_time=None, back_calc=None, num_points=None): >>>> +def ns_r1rho_2site(M0=None, M0_T=None, r1rho_prime=None, omega=None, >>>> offset=None, r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, >>>> relax_time=None, inv_relax_time=None, back_calc=None, num_points=None): >>>> """The 2-site numerical solution to the Bloch-McConnell equation for >>>> R1rho data. >>>> >>>> This function calculates and stores the R1rho values. >>>> >>>> >>>> @keyword M0: This is a vector that contains the >>>> initial magnetizations corresponding to the A and B state transverse >>>> magnetizations. >>>> - @type M0: numpy float64, rank-1, 7D array >>>> + @type M0: numpy float array of rank >>>> [NE][NS][NM][NO][ND][6][1] >>>> + @keyword M0_T: This is a vector that contains the >>>> initial magnetizations corresponding to the A and B state transverse >>>> magnetizations, where the outer two axis has been swapped for efficient >>>> dot operations. >>>> + @type M0_T: numpy float array of rank >>>> [NE][NS][NM][NO][ND][1][6] >>>> @keyword r1rho_prime: The R1rho_prime parameter value (R1rho >>>> with no exchange). >>>> @type r1rho_prime: numpy float array of rank >>>> [NE][NS][NM][NO][ND] >>>> @keyword omega: The chemical shift for the spin in rad/s. >>>> @@ -238,43 +240,22 @@ >>>> # Magnetization evolution. >>>> Rexpo_M0_mat = einsum('...ij,...jk', Rexpo_mat, M0) >>>> >>>> - # Transpose M0, to prepare for dot operation. Roll the last axis one >>>> back. >>>> - M0_T = rollaxis(M0, 6, 5) >>>> - >>>> - if NS == 1: >>>> - # Loop over the spectrometer frequencies. >>>> - for mi in range(NM): >>>> - # Loop over offsets: >>>> - for oi in range(NO): >>>> - # Extract number of points. >>>> - num_points_i = num_points[0, 0, mi, oi] >>>> - >>>> - # Loop over the time points, back calculating the R2eff >>>> values. >>>> - for j in range(num_points_i): >>>> - >>>> - # If the number of spins are 1, do the fastest method >>>> by dot product. Else do it by einstein summation. >>>> - # Set the spin index for one spin. >>>> - si = 0 >>>> - # Extract the preformed matrix that rotate the >>>> magnetization previous to spin-lock into the weff frame. >>>> - M0_i= M0_T[0, si, mi, oi, j] >>>> - >>>> - # Extract from the pre-formed Magnetization evolution >>>> matrix. >>>> - Rexpo_M0_mat_i = Rexpo_M0_mat[0, si, mi, oi, j] >>>> - >>>> - # Magnetization evolution. >>>> - MA = dot(M0_i, Rexpo_M0_mat_i)[0, 0] >>>> - >>>> - # The next lines calculate the R1rho using a >>>> two-point approximation, i.e. assuming that the decay is mono-exponential. >>>> - if MA <= 0.0 or isNaN(MA): >>>> - back_calc[0, si, mi, oi, j] = 1e99 >>>> - else: >>>> - back_calc[0, si, mi, oi, j]= -inv_relax_time[0, >>>> si, mi, oi, j] * log(MA) >>>> - >>>> - else: >>>> - # Magnetization evolution, which include all dimensions. >>>> - MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, :, >>>> 0, 0] >>>> - >>>> - # Do back calculation. >>>> - back_calc[:] = -inv_relax_time * log(MA_mat) >>>> - >>>> - >>>> + # Magnetization evolution, which include all dimensions. >>>> + MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, :, 0, >>>> 0] >>>> + >>>> + # Insert safe checks. >>>> + if min(MA_mat) < 0.0: >>>> + mask_min_MA_mat = masked_less(MA_mat, 0.0) >>>> + # Fill with high values. >>>> + MA_mat[mask_min_MA_mat.mask] = 1e100 >>>> + >>>> + # Do back calculation. >>>> + back_calc[:] = -inv_relax_time * log(MA_mat) >>>> + >>>> + # Catch errors, taking a sum over array is the fastest way to check >>>> for >>>> + # +/- inf (infinity) and nan (not a number). >>>> + if not isfinite(sum(back_calc)): >>>> + # Replaces nan, inf, etc. with fill value. >>>> + fix_invalid(back_calc, copy=False, fill_value=1e100) >>>> + >>>> + >>>> >>>> Modified: branches/disp_spin_speed/target_functions/relax_disp.py >>>> URL: >>>> http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/target_functions/relax_disp.py?rev=24294&r1=24293&r2=24294&view=diff >>>> ============================================================================== >>>> --- branches/disp_spin_speed/target_functions/relax_disp.py (original) >>>> +++ branches/disp_spin_speed/target_functions/relax_disp.py Wed Jun 25 >>>> 02:14:36 2014 >>>> @@ -26,8 +26,7 @@ >>>> >>>> # Python module imports. >>>> from copy import deepcopy >>>> -from math import pi >>>> -from numpy import add, array, arctan2, asarray, cos, complex64, dot, >>>> float64, int16, max, multiply, ones, sin, sqrt, sum, tile, zeros >>>> +from numpy import arctan2, cos, dot, float64, int16, multiply, ones, >>>> rollaxis, pi, sin, sum, zeros >>>> from numpy.ma import masked_equal >>>> >>>> # relax module imports. >>>> @@ -415,6 +414,9 @@ >>>> M0_2[2, 0] = 1 >>>> M0_cos = multiply.outer( cos(theta_mat), M0_2 ) >>>> self.M0 = M0_sin + M0_cos >>>> + # Transpose M0, to prepare for dot operation. Roll the last >>>> axis one back, corresponds to a transpose for the outer two axis. >>>> + self.M0_T = rollaxis(self.M0, 6, 5) >>>> + >>>> if model in [MODEL_NS_R1RHO_3SITE, MODEL_NS_R1RHO_3SITE_LINEAR]: >>>> self.M0 = zeros(9, float64) >>>> # Offset of spin-lock from A. >>>> @@ -1598,7 +1600,7 @@ >>>> self.r20_struct[:] = multiply.outer( r1rho_prime.reshape(self.NE, >>>> self.NS, self.NM), self.no_nd_ones ) >>>> >>>> # Back calculate the R2eff values. >>>> - ns_r1rho_2site(M0=self.M0, r1rho_prime=self.r20_struct, >>>> omega=self.chemical_shifts, offset=self.offset, r1=self.r1, pA=pA, >>>> dw=self.dw_struct, kex=kex, spin_lock_fields=self.spin_lock_omega1, >>>> relax_time=self.relax_times, inv_relax_time=self.inv_relax_times, >>>> back_calc=self.back_calc, num_points=self.num_disp_points) >>>> + ns_r1rho_2site(M0=self.M0, M0_T=self.M0_T, >>>> r1rho_prime=self.r20_struct, omega=self.chemical_shifts, >>>> offset=self.offset, r1=self.r1, pA=pA, dw=self.dw_struct, kex=kex, >>>> spin_lock_fields=self.spin_lock_omega1, relax_time=self.relax_times, >>>> inv_relax_time=self.inv_relax_times, back_calc=self.back_calc, >>>> num_points=self.num_disp_points) >>>> >>>> # Clean the data for all values, which is left over at the end of >>>> arrays. >>>> self.back_calc = self.back_calc*self.disp_struct >>>> >>>> >>>> _______________________________________________ >>>> relax (http://www.nmr-relax.com) >>>> >>>> This is the relax-commits mailing list >>>> [email protected] >>>> >>>> To unsubscribe from this list, get a password >>>> reminder, or change your subscription options, >>>> visit the list information page at >>>> https://mail.gna.org/listinfo/relax-commits _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

