Hi Edward. I have no interest in pursuing this at the moment.
If numpy should provide a universal function, which is faster, like einsum is a replacement for dot, for multiple dimension, then we can have a look. But I wont use scipy. The thing I would like more, is to eliminate the looping when doing the matrix exponential. I have just completed a test/profiling script, which strides through the data. It took it me quite long to grasp the striding thing, but I think it is worth my time to learn how to access data in any possible way. We can gain 1.5 in speed ! :-) Best Troels 2014-06-25 18:23 GMT+02:00 Edward d'Auvergne <[email protected]>: > For this problem, the following reference could be interesting for > summarising the different techniques: > > Moler, C. and Van Loan, C. (2003) Nineteen Dubious Ways to Compute > the Exponential of a Matrix, Twenty-Five Years Later. SIAM Review, > 45, 3-49. (http://dx.doi.org/10.1137/S00361445024180 or > http://www.cs.cornell.edu/cv/researchpdf/19ways+.pdf). > > I'm putting the link here for future reference as well. > > Regards, > > Edward > > > > On 25 June 2014 18:12, Edward d'Auvergne <[email protected]> wrote: > > Well, looking at the Scipy source code of ./scipy/linalg/matfuncs.py, > > expm2() and expm3() are implemented in Python code, interfacing to > > other scipy functions. And expm() is an alias for the function of the > > same name in ./scipy/sparse/linalg/matfuncs.py, which is also > > implemented in Python code. So there is no BLAS/LAPACK magic > > happening here, except maybe in the functions that these expm*() > > functions call. > > > > Regards, > > > > Edward > > > > > > > > On 25 June 2014 17:54, Edward d'Auvergne <[email protected]> wrote: > >> Hi Troels, > >> > >> If you are interested in eliminating this 86% bottleneck in the > >> numeric R1rho dispersion models in the > >> > lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() > >> function, it might be worth looking at the scipy.linalg.expm(), > >> scipy.linalg.expm2(), and scipy.linalg.expm3() functions > >> ( > http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html#exponential-and-logarithm-functions > ), > >> or the BLAS/LAPACK functions these interface to. Note that I > >> eliminated the use of these functions in the dispersion analysis a > >> while ago (http://article.gmane.org/gmane.science.nmr.relax.scm/18869/ > ). > >> And for good reason, there was a nasty Scipy bug in the Pade > >> approximation of scipy.linalg.expm(). This and the fatal bugs in all > >> optimisation algorithms in Scipy (the catalyst for me to create minfx, > >> originally as part of relax, https://gna.org/projects/minfx/), is the > >> reason why I try to avoid Scipy as much as possible - I just cannot > >> trust it any more. Scipy is a collection of poorly integrated tools, > >> often with the original authors completely disappearing, and it is > >> full of bugs. Anyway, maybe one of these functions would be of use or > >> would be a good inspiration. Oh, the scipy.linalg.expm() function in > >> the original code comes from the fitting_main_kex.py script from > >> Mathilde Lescanne, Paul Schanda, and Dominique Marion. Note that if > >> the Taylor series approximation of scipy.linalg.expm3() works well for > >> the numeric dispersion models, which is untested, that might be by far > >> the fastest solution. > >> > >> Regards, > >> > >> Edward > >> > >> > >> On 25 June 2014 08:20, Edward d'Auvergne <[email protected]> wrote: > >>> Wow, you managed to work out how to remove the slow Python looping > >>> from the numeric models! For the record, this is the speed up for > >>> this 'NS R1rho 2-site' model using Python 2.7.6 & numpy 1.8.0 on a > >>> 64-bit Linux system: > >>> > >>> """ > >>> $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 > >>> /data/relax/branches/disp_spin_speed > >>> [snip] > >>> > >>> New relax version: relax repository checkout r24308 > >>> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed > >>> Old relax version: relax repository checkout r24274 > >>> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed > >>> > >>> Execution iteration 1 > >>> > >>> $ python profiling_ns_r1rho_2site.py > /data/relax/branches/disp_spin_speed2 > >>> 1000 0.016 0.000 3.561 0.004 > >>> relax_disp.py:1582(func_ns_r1rho_2site) > >>> 10 0.013 0.001 3.237 0.324 > >>> relax_disp.py:1582(func_ns_r1rho_2site) > >>> $ python profiling_ns_r1rho_2site.py > /data/relax/branches/disp_spin_speed > >>> 1000 0.017 0.000 6.245 0.006 > >>> relax_disp.py:1552(func_ns_r1rho_2site) > >>> 10 0.013 0.001 5.899 0.590 > >>> relax_disp.py:1552(func_ns_r1rho_2site) > >>> [snip] > >>> > >>> 100 single spins analysis: > >>> NS R1rho 2-site: 62.450+/-0.000 -> 35.610+/-0.000, > 1.754x faster. > >>> > >>> Cluster of 100 spins analysis: > >>> NS R1rho 2-site: 58.990+/-0.000 -> 32.370+/-0.000, > 1.822x faster. > >>> """ > >>> > >>> > >>> This is quite significant for these models. With all the changes > >>> since relax 3.2.2: > >>> > >>> """ > >>> $ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2 > >>> /data/relax/tags/3.2.2/ > >>> [snip] > >>> > >>> New relax version: relax repository checkout r24308 > >>> svn+ssh://[email protected]/svn/relax/branches/disp_spin_speed > >>> Old relax version: relax 3.2.2 > >>> > >>> Execution iteration 1 > >>> > >>> $ python profiling_ns_r1rho_2site.py > /data/relax/branches/disp_spin_speed2 > >>> 1000 0.016 0.000 3.576 0.004 > >>> relax_disp.py:1582(func_ns_r1rho_2site) > >>> 10 0.013 0.001 3.247 0.325 > >>> relax_disp.py:1582(func_ns_r1rho_2site) > >>> $ python profiling_ns_r1rho_2site.py /data/relax/tags/3.2.2/ > >>> 1000 0.245 0.000 24.323 0.024 > >>> relax_disp.py:1681(func_ns_r1rho_2site) > >>> 10 0.266 0.027 24.439 2.444 > >>> relax_disp.py:1681(func_ns_r1rho_2site) > >>> [snip] > >>> > >>> 100 single spins analysis: > >>> NS R1rho 2-site: 243.230+/-0.000 -> 35.760+/-0.000, > 6.802x faster. > >>> > >>> Cluster of 100 spins analysis: > >>> NS R1rho 2-site: 244.390+/-0.000 -> 32.470+/-0.000, > 7.527x faster. > >>> """ > >>> > >>> > >>> Considering how slow these models are compared to the analytic models, > >>> this is a huge improvement! According to your > >>> profiling_ns_r1rho_2site.py script, the slow parts are: > >>> > >>> """ > >>> ('chi2 cluster:', 14153134802.76194) > >>> Wed Jun 25 07:59:18 2014 /tmp/tmp17lqUn > >>> > >>> 211077 function calls in 3.510 seconds > >>> > >>> Ordered by: cumulative time > >>> > >>> ncalls tottime percall cumtime percall filename:lineno(function) > >>> 1 0.000 0.000 3.510 3.510 <string>:1(<module>) > >>> 1 0.002 0.002 3.510 3.510 > >>> profiling_ns_r1rho_2site.py:553(cluster) > >>> 10 0.000 0.000 3.254 0.325 > >>> profiling_ns_r1rho_2site.py:515(calc) > >>> 10 0.015 0.002 3.254 0.325 > >>> relax_disp.py:1582(func_ns_r1rho_2site) > >>> 10 0.015 0.002 3.236 0.324 > >>> ns_r1rho_2site.py:190(ns_r1rho_2site) > >>> 10 0.104 0.010 3.041 0.304 > >>> matrix_exponential.py:33(matrix_exponential_rank_NE_NS_NM_NO_ND_x_x) > >>> 10 1.803 0.180 1.827 0.183 linalg.py:982(eig) > >>> 10 0.564 0.056 0.601 0.060 linalg.py:455(inv) > >>> 40 0.501 0.013 0.501 0.013 > {numpy.core.multiarray.einsum} > >>> """ > >>> > >>> > >>> The single spin numbers are very similar. Looking at the total times, > >>> the numpy.einsum and numpy.inv functions are taking 1/7 of the time > >>> each, and numpy.eig 51% of the time - this is a total of 80% of the > >>> time. The > lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x() > >>> function now takes up 86% of the cumulative time - so this is the new > >>> bottleneck. The only way to solve this would be using clever linear > >>> algebra tricks to simplify or to change the logic in > >>> matrix_exponential_rank_NE_NS_NM_NO_ND_x_x(). There are multiple > >>> techniques for finding the matrix exponential, it could be that the > >>> eigenvalue method is one of the slowest. This looks to be interesting > >>> for an overview of recent research in the mathematics field on this > >>> subject: http://www.maths.manchester.ac.uk/~higham/talks/exp09.pdf. > >>> Great work though, you have eliminated the worst bottleneck for the > >>> numeric models! > >>> > >>> Cheers, > >>> > >>> Edward > >>> > >>> > >>> > >>> On 25 June 2014 02:14, <[email protected]> wrote: > >>>> Author: tlinnet > >>>> Date: Wed Jun 25 02:14:36 2014 > >>>> New Revision: 24294 > >>>> > >>>> URL: http://svn.gna.org/viewcvs/relax?rev=24294&view=rev > >>>> Log: > >>>> Speeded up the code of NS r1rho 2site. > >>>> > >>>> This was essential done to numpy einsum, and doing the dot operations > in multiple dimensions. > >>>> It was though necessary to realize, that to do the proper dot product > operations, the outer two > >>>> axis if M0 should be swapped, by rolling the outer axis one back. > >>>> > >>>> Task #7807 (https://gna.org/task/index.php?7807): Speed-up of > dispersion models for Clustered analysis. > >>>> > >>>> Modified: > >>>> branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py > >>>> branches/disp_spin_speed/target_functions/relax_disp.py > >>>> > >>>> Modified: branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py > >>>> URL: > http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py?rev=24294&r1=24293&r2=24294&view=diff > >>>> > ============================================================================== > >>>> --- branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py > (original) > >>>> +++ branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py Wed > Jun 25 02:14:36 2014 > >>>> @@ -50,10 +50,10 @@ > >>>> """ > >>>> > >>>> # Python module imports. > >>>> -from numpy import array, cos, dot, einsum, float64, log, multiply, > rollaxis, sin, sum > >>>> +from numpy import array, einsum, float64, isfinite, log, min, > multiply, rollaxis, sin, sum > >>>> +from numpy.ma import fix_invalid, masked_less > >>>> > >>>> # relax module imports. > >>>> -from lib.float import isNaN > >>>> from lib.dispersion.matrix_exponential import > matrix_exponential_rank_NE_NS_NM_NO_ND_x_x > >>>> > >>>> # Repetitive calculations (to speed up calculations). > >>>> @@ -187,14 +187,16 @@ > >>>> return matrix > >>>> > >>>> > >>>> -def ns_r1rho_2site(M0=None, r1rho_prime=None, omega=None, > offset=None, r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, > relax_time=None, inv_relax_time=None, back_calc=None, num_points=None): > >>>> +def ns_r1rho_2site(M0=None, M0_T=None, r1rho_prime=None, omega=None, > offset=None, r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, > relax_time=None, inv_relax_time=None, back_calc=None, num_points=None): > >>>> """The 2-site numerical solution to the Bloch-McConnell equation > for R1rho data. > >>>> > >>>> This function calculates and stores the R1rho values. > >>>> > >>>> > >>>> @keyword M0: This is a vector that contains the > initial magnetizations corresponding to the A and B state transverse > magnetizations. > >>>> - @type M0: numpy float64, rank-1, 7D array > >>>> + @type M0: numpy float array of rank > [NE][NS][NM][NO][ND][6][1] > >>>> + @keyword M0_T: This is a vector that contains the > initial magnetizations corresponding to the A and B state transverse > magnetizations, where the outer two axis has been swapped for efficient dot > operations. > >>>> + @type M0_T: numpy float array of rank > [NE][NS][NM][NO][ND][1][6] > >>>> @keyword r1rho_prime: The R1rho_prime parameter value > (R1rho with no exchange). > >>>> @type r1rho_prime: numpy float array of rank > [NE][NS][NM][NO][ND] > >>>> @keyword omega: The chemical shift for the spin in > rad/s. > >>>> @@ -238,43 +240,22 @@ > >>>> # Magnetization evolution. > >>>> Rexpo_M0_mat = einsum('...ij,...jk', Rexpo_mat, M0) > >>>> > >>>> - # Transpose M0, to prepare for dot operation. Roll the last axis > one back. > >>>> - M0_T = rollaxis(M0, 6, 5) > >>>> - > >>>> - if NS == 1: > >>>> - # Loop over the spectrometer frequencies. > >>>> - for mi in range(NM): > >>>> - # Loop over offsets: > >>>> - for oi in range(NO): > >>>> - # Extract number of points. > >>>> - num_points_i = num_points[0, 0, mi, oi] > >>>> - > >>>> - # Loop over the time points, back calculating the > R2eff values. > >>>> - for j in range(num_points_i): > >>>> - > >>>> - # If the number of spins are 1, do the fastest > method by dot product. Else do it by einstein summation. > >>>> - # Set the spin index for one spin. > >>>> - si = 0 > >>>> - # Extract the preformed matrix that rotate the > magnetization previous to spin-lock into the weff frame. > >>>> - M0_i= M0_T[0, si, mi, oi, j] > >>>> - > >>>> - # Extract from the pre-formed Magnetization > evolution matrix. > >>>> - Rexpo_M0_mat_i = Rexpo_M0_mat[0, si, mi, oi, j] > >>>> - > >>>> - # Magnetization evolution. > >>>> - MA = dot(M0_i, Rexpo_M0_mat_i)[0, 0] > >>>> - > >>>> - # The next lines calculate the R1rho using a > two-point approximation, i.e. assuming that the decay is mono-exponential. > >>>> - if MA <= 0.0 or isNaN(MA): > >>>> - back_calc[0, si, mi, oi, j] = 1e99 > >>>> - else: > >>>> - back_calc[0, si, mi, oi, j]= > -inv_relax_time[0, si, mi, oi, j] * log(MA) > >>>> - > >>>> - else: > >>>> - # Magnetization evolution, which include all dimensions. > >>>> - MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, > :, :, 0, 0] > >>>> - > >>>> - # Do back calculation. > >>>> - back_calc[:] = -inv_relax_time * log(MA_mat) > >>>> - > >>>> - > >>>> + # Magnetization evolution, which include all dimensions. > >>>> + MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, > :, 0, 0] > >>>> + > >>>> + # Insert safe checks. > >>>> + if min(MA_mat) < 0.0: > >>>> + mask_min_MA_mat = masked_less(MA_mat, 0.0) > >>>> + # Fill with high values. > >>>> + MA_mat[mask_min_MA_mat.mask] = 1e100 > >>>> + > >>>> + # Do back calculation. > >>>> + back_calc[:] = -inv_relax_time * log(MA_mat) > >>>> + > >>>> + # Catch errors, taking a sum over array is the fastest way to > check for > >>>> + # +/- inf (infinity) and nan (not a number). > >>>> + if not isfinite(sum(back_calc)): > >>>> + # Replaces nan, inf, etc. with fill value. > >>>> + fix_invalid(back_calc, copy=False, fill_value=1e100) > >>>> + > >>>> + > >>>> > >>>> Modified: branches/disp_spin_speed/target_functions/relax_disp.py > >>>> URL: > http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/target_functions/relax_disp.py?rev=24294&r1=24293&r2=24294&view=diff > >>>> > ============================================================================== > >>>> --- branches/disp_spin_speed/target_functions/relax_disp.py > (original) > >>>> +++ branches/disp_spin_speed/target_functions/relax_disp.py Wed > Jun 25 02:14:36 2014 > >>>> @@ -26,8 +26,7 @@ > >>>> > >>>> # Python module imports. > >>>> from copy import deepcopy > >>>> -from math import pi > >>>> -from numpy import add, array, arctan2, asarray, cos, complex64, dot, > float64, int16, max, multiply, ones, sin, sqrt, sum, tile, zeros > >>>> +from numpy import arctan2, cos, dot, float64, int16, multiply, ones, > rollaxis, pi, sin, sum, zeros > >>>> from numpy.ma import masked_equal > >>>> > >>>> # relax module imports. > >>>> @@ -415,6 +414,9 @@ > >>>> M0_2[2, 0] = 1 > >>>> M0_cos = multiply.outer( cos(theta_mat), M0_2 ) > >>>> self.M0 = M0_sin + M0_cos > >>>> + # Transpose M0, to prepare for dot operation. Roll the > last axis one back, corresponds to a transpose for the outer two axis. > >>>> + self.M0_T = rollaxis(self.M0, 6, 5) > >>>> + > >>>> if model in [MODEL_NS_R1RHO_3SITE, > MODEL_NS_R1RHO_3SITE_LINEAR]: > >>>> self.M0 = zeros(9, float64) > >>>> # Offset of spin-lock from A. > >>>> @@ -1598,7 +1600,7 @@ > >>>> self.r20_struct[:] = multiply.outer( > r1rho_prime.reshape(self.NE, self.NS, self.NM), self.no_nd_ones ) > >>>> > >>>> # Back calculate the R2eff values. > >>>> - ns_r1rho_2site(M0=self.M0, r1rho_prime=self.r20_struct, > omega=self.chemical_shifts, offset=self.offset, r1=self.r1, pA=pA, > dw=self.dw_struct, kex=kex, spin_lock_fields=self.spin_lock_omega1, > relax_time=self.relax_times, inv_relax_time=self.inv_relax_times, > back_calc=self.back_calc, num_points=self.num_disp_points) > >>>> + ns_r1rho_2site(M0=self.M0, M0_T=self.M0_T, > r1rho_prime=self.r20_struct, omega=self.chemical_shifts, > offset=self.offset, r1=self.r1, pA=pA, dw=self.dw_struct, kex=kex, > spin_lock_fields=self.spin_lock_omega1, relax_time=self.relax_times, > inv_relax_time=self.inv_relax_times, back_calc=self.back_calc, > num_points=self.num_disp_points) > >>>> > >>>> # Clean the data for all values, which is left over at the > end of arrays. > >>>> self.back_calc = self.back_calc*self.disp_struct > >>>> > >>>> > >>>> _______________________________________________ > >>>> relax (http://www.nmr-relax.com) > >>>> > >>>> This is the relax-commits mailing list > >>>> [email protected] > >>>> > >>>> To unsubscribe from this list, get a password > >>>> reminder, or change your subscription options, > >>>> visit the list information page at > >>>> https://mail.gna.org/listinfo/relax-commits > > _______________________________________________ > relax (http://www.nmr-relax.com) > > This is the relax-devel mailing list > [email protected] > > To unsubscribe from this list, get a password > reminder, or change your subscription options, > visit the list information page at > https://mail.gna.org/listinfo/relax-devel > _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

