Re: [ITK-dev] efficiency of vnl_matrix

Jian Cheng Thu, 12 Mar 2015 08:44:46 -0700

Hi,

For (2), the attachment is a comparison I made.
It adds additional tests on vnl to the code from the previous link
http://nghiaho.com/?p=1726 .
Some header files can be downloaded from
https://github.com/DiffusionMRITool/dmritool/tree/master/Modules/HelperFunctions/include


Then you can build and run
// With eigen
// g++ -DTEST_EIGEN test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse_vnl -lopencv_core  -O3 -DNDEBUG

// With ARMA OpenBLAS
// g++ -DTEST_ARMA test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -larmadillo -lgomp -fopenmp
-lopenblas -O3 -DNDEBUG -DHAVE_INLINE

// with vnl
// g++ -DTEST_VNL test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
-DUTL_USE_FASTLAPACK

// with vnl + openblas
// g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
-DUTL_USE_FASTLAPACK
// with vnl + mkl
// g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
-DUTL_USE_FASTLAPACK

// with utl+openblas
// g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
-DUTL_USE_FASTLAPACK
// with utl + mkl;
// g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3
-DUTL_USE_FASTLAPACK

In my experiments, using blas functions from mkl is the most efficient way.

best,
Jian Cheng

On 03/12/2015 10:41 AM, Matt McCormick wrote:
> Hi,
>
> From the discussion so far, it appears the following serious of steps
> could be taken to move forward on performance:
>
> 1) Replace vnl_vector and vnl_matrix with vnl_vector_fixed and
> vnl_matrix_fixed when possible.
>
> 2) Add Jian Cheng's BLAS and LAPACK backends for vnl_vector and vnl_matrix.
>
> 3) Add support for armadillo or eigen.
>
> 1) and 2) will be relatively easy to make, and will hopefully have an
> immediate impact on performance.  3) will take make work to happen,
> and it will take longer to impact the toolkit.  We will need to have
> cross-platform builds of the libraries in the repository.  Also, many
> ITK classes encapsulate their use of VNL very poorly, so it will not
> be as simple as swapping out or improving their backends.
>
> 2 cents,
> Matt
>
> On Thu, Mar 12, 2015 at 10:20 AM, Bradley Lowekamp
> <[email protected]> wrote:
>> Chuck,
>>
>> Thank you for giving us that important conclusion, under a quite difficult
>> situations.
>>
>> I wonder if there is any distinction in the usage of vnl_matrix vs
>> vnl_matrix_fixed. I would expect that operations done for pixel transforms
>> should have there dimension know at run-time and should be able to use the
>> vnl_matrix_fixed.
>>
>> I also have considered methods to transform a whole array of points at a
>> time. I wonder if for 3x3*3*256 sized operations ( scan-line size ) if there
>> would be benefit with the library based operations.
>>
>> Brad
>>
>>
>> On Mar 12, 2015, at 10:02 AM, Chuck Atkins <[email protected]> wrote:
>>
>> I worked with Julie Langou, maintainer of LAPACK, on this project a few
>> years ago.  The funding situation ended up very strange and messy and we
>> basically had to cram 3 months worth of effort into 3 weeks, so needless to
>> say, we were not able to really achieve our goals.  However, we spent a fair
>> amount of time profiling ITK and analyzing it's hot spots from vnl to
>> determine where to best spend the small ammount of time we had.  The results
>> were not as straight forward as we expected.  It turns out that most of the
>> use for vnl_matrix and vnl_vector was actually for an enourmous number of
>> operations on very small sized vectors and matricies (dimensions of 2, 3, or
>> 4), often for coordinate and geometry calculations or for small per-pixel
>> operations that were not easily vectorized in the implementation at the
>> time.  In these cases, the overhead of calling out to a BLAS or LAPACK
>> library was much too expensive and the existing use of VNL was far more
>> optimal.  This falls apart, however when trying to use vnl for more complex
>> algorithms since the larger matrix operations will be where the benefit can
>> be seen.  So just re-implementing the vnl vector and matrix classes and
>> operators with underlying BLAS and LAPACK routines turned out to not be the
>> best solution for ITK as a whole.
>>
>> - Chuck
>> tage of the performance gains of large block matrix and vector
>> operations seen with optimized BLAS and LAPACK libraries, the
>> computations needed to be re-worked to act in an SoA (struct of
>> arrays) fashion instead.  Given our limited time and resources, this
>> was out of scope for what we could tackle.
>>
>> * Typically AoS and SoA refer to storage layout but I'm using it to
>> refer to computation layout.  The terminology may not be correct but
>> I think you can understand what I mean.
>> On Thu, Mar 12, 2015 at 8:32 AM, Bradley Lowekamp <[email protected]>
>> wrote:
>>> Hello,
>>>
>>> If I was writing my own ITK classes, and needed a fast matrix library I
>>> would likely pursue an additional dependency on an efficient numeric library
>>> for that project, such as eigen.
>>>
>>> However for the broad appeal of ITK I would think a flexible back end
>>> would be best. As I think there are a variety of BLAS and LAPACK libraries
>>> available ( commercial, open source, vender free ). It would be nice to pick
>>> one what has been optimized for the current architecture. I would think it
>>> would be most flexible to use this interface in the back end of a chosen
>>> numeric interface ( currently VNL ). Unfortunately, I don't have as much
>>> experience with these libraries as I'd like.
>>>
>>> Brad
>>>
>>> On Mar 12, 2015, at 5:15 AM, [email protected] wrote:
>>>
>>>> Hi,
>>>>
>>>> I think the eigen library is a mature and very fast library for these
>>>> kind of things:
>>>> http://eigen.tuxfamily.org/index.php?title=Main_Page
>>>>
>>>> You may want to check it out, to see if it offers what you need.
>>>>
>>>> It would be great to be able to use these within the itk.
>>>>
>>>> 2c
>>>> Marius
>>>>
>>>> -----Original Message-----
>>>> From: Insight-developers [mailto:[email protected]] On
>>>> Behalf Of Jian Cheng
>>>> Sent: Wednesday, March 11, 2015 23:17
>>>> To: Matt McCormick
>>>> Cc: Chuck Atkins; ITK
>>>> Subject: Re: [ITK-dev] efficiency of vnl_matrix
>>>>
>>>> Hi Matt,
>>>>
>>>> Thanks for your help, and also for the ITK workshop in UNC last time.
>>>>
>>>> It is very unfortunate. The efficiency of these numerical math operators
>>>> are very important for many applications.
>>>>
>>>> I recently released an ITK based toolbox, called dmritool, for diffusion
>>>> MRI data processing.
>>>> It has some files to add some supports of blas, lapack, mkl to
>>>> vnl_matrix and vnl_vector.
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlBlas_8h_source.html
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVNLBlas_8h_source.html
>>>>
>>>> Those functions are not internally for vnl_matrix class. They are
>>>> operators for the data pointer stored in vnl_matrix object.
>>>> Thus, later I made a N-dimensional array library which internally
>>>> includes those functions, and also supports expression template to avoid
>>>> temporary copies.
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlMatrix_8h_source.html
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVector_8h_source.html
>>>>
>>>> The efficiency comparison between vnl_vector/vnl_matrix and the
>>>> vector/matrix using openblas, lapack, or mkl can be found by running those
>>>> two tests
>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLBlasGTest.cxx
>>>>
>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLLapackGTest.cxx
>>>>
>>>> Maybe some codes can be used as patches in somewhere in ITK. I am not
>>>> sure. Maybe we need more discussion on it.
>>>> With your help and discussion, I will be very glad to make my first
>>>> patch to ITK.
>>>> Thanks.
>>>>
>>>> best,
>>>> Jian Cheng
>>>>
>>>>
>>>> On 03/11/2015 04:39 PM, Matt McCormick wrote:
>>>>> Hi Jian,
>>>>>
>>>>> Yes, it would be wonderful to improve the efficiency of these basic
>>>>> numerical operations.
>>>>>
>>>>> Funding for the Refactor Numerical Libraries has currently ended, and
>>>>> the effort is currently frozen.  However, you are more than welcome to
>>>>> pick it up and we can help you get it into ITK.  More information on
>>>>> the patch submission process can be found here [1] and in the ITK
>>>>> Software Guide.
>>>>>
>>>>> Thanks,
>>>>> Matt
>>>>>
>>>>> [1]
>>>>> https://insightsoftwareconsortium.github.io/ITKBarCamp-doc/CommunitySo
>>>>> ftwareProcess/SubmitAPatchToGerrit/index.html
>>>>>
>>>>> On Wed, Mar 11, 2015 at 4:07 PM, Jian Cheng <[email protected]>
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> My task using ITK has intensive matrix-matrix product, pseudo-inverse,
>>>>>> etc.
>>>>>> Thus the performance is actually mainly determined by the matrix
>>>>>> library I used.
>>>>>> Firstly I use vnl_matrix and vnl_vector in ITK. Then I found it is
>>>>>> very inefficient because vnl matrix lib does not use blas and lapack.
>>>>>> After I wrote my own matrix class which uses openblas and lapack, I
>>>>>> got a hug gain of performance.
>>>>>>
>>>>>> I found there is a proposal to improve the efficiency of numerical
>>>>>> libraries in ITK.
>>>>>> http://www.itk.org/Wiki/ITK/Release_4/Refactor_Numerical_Libraries
>>>>>> I am not sure how is the progress of the proposal.
>>>>>> I wonder when the vnl matrix lib can internally support blas and
>>>>>> lapack, or mkl, so that we can just use it without lose of the
>>>>>> efficiency.
>>>>>> Thanks.
>>>>>>
>>>>>> best,
>>>>>> Jian Cheng
>>>>>> _______________________________________________
>>>>>> Powered by www.kitware.com
>>>>>>
>>>>>> Visit other Kitware open-source projects at
>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>
>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>> http://kitware.com/products/protraining.php
>>>>>>
>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>
>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Kitware offers ITK Training Courses, for more information visit:
>>>> http://kitware.com/products/protraining.php
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Kitware offers ITK Training Courses, for more information visit:
>>>> http://kitware.com/products/protraining.php
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>

// Short benchmark doing psuedo-inverse of the form:
// AX = B
// X = (A'A)^-1*A'*B
// for various dimension sizes

// With ARMA Atlas
// g++ -DTEST_ARMA test_matrix_pseudoinverse.cpp -o test_matrix_pseudoinverse -lopencv_core -larmadillo -lgomp -fopenmp -L/usr/lib64/atlas-sse3/ -lcblas -march=native -O3 -DARMA_NO_DEBUG -DNDEBUG -DHAVE_INLINE

// With ARMA OpenBLAS
// g++ -DTEST_ARMA test_matrix_pseudoinverse.cpp -o test_matrix_pseudoinverse -lopencv_core -larmadillo -lgomp -fopenmp -lopenblas -O3 -DNDEBUG -DHAVE_INLINE

// with vnl
// g++ -DTEST_VNL test_matrix_pseudoinverse.cpp -o test_matrix_pseudoinverse -lopencv_core -lvnl -lvnl_algo -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG -DUTL_USE_FASTLAPACK

// with vnl blas
// g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3 -DUTL_USE_FASTLAPACK
// g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG -DUTL_USE_FASTLAPACK

// with utl
// g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3 -DUTL_USE_FASTLAPACK
// g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3 -DUTL_USE_FASTLAPACK

#include <cstdio>
#include <iostream>
#include <sys/time.h>
#include <algorithm>
#include <vector>

// GSL can't coexist with Armadillo due to enum conflicts

// Turn each one on one at a time to make the spreadsheet
// #define TEST_OPENCV
//#define TEST_EIGEN
// #define TEST_ARMA
//#define TEST_GSL
// #define TEST_VNL
// #define TEST_VNL_BLAS

#include <opencv2/core/core.hpp>

#ifdef TEST_EIGEN
#include <eigen3/Eigen/Dense>
#endif

#ifdef TEST_ARMA
#include <armadillo>
#endif

#ifdef TEST_GSL
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_blas.h>
#include <gsl/gsl_linalg.h>
#endif

#ifdef TEST_VNL
#include <vnl/vnl_matrix.h>
#include "utlVNLLapack.h"
#endif

#ifdef TEST_VNL_BLAS
#include <vnl/vnl_matrix.h>
#include "utlVNLLapack.h"
#endif

#ifdef TEST_UTL
#include "utlNDArray.h"
#endif

#include <ctime>

using namespace std;

double TimeDiff(timeval t1, timeval t2);
double RunOpenCV(int N, int M, vector <double> &A, vector <double> &B, int iterations);
double RunEigen(int N, int M, vector <double> &A, vector <double> &B, int iterations);
double RunArmadillo(int N, int M, vector <double> &A, vector <double> &B, int iterations);
double RunUTL(int N, int M, vector <double> &A, vector <double> &B, int iterations);
double RunGSL(int N, int M, vector <double> &A, vector <double> &B, int iterations);
double RunVNL(int N, int M, vector <double> &A, vector <double> &B, int iterations);
double RunVNLBlas(int N, int M, vector <double> &A, vector <double> &B, int iterations);

int main()
{
  std::srand(std::time(NULL)); 
  const int N = 1000000;
  const int iterations = 10;

  for(int M=2; M <= 16; M++) { // dimension
    vector <double> A(N*M);
    vector <double> B(N);
    vector <double> X(M);

    // Generate some random data
    for(size_t i=0; i < X.size(); i++) {
      X[i] = rand()/(1.0 + RAND_MAX);
    }

    for(size_t i=0; i < A.size(); i++) {
      A[i] = rand()/(1.0 + RAND_MAX);
    }

    cv::Mat _A(N, M, CV_64F, &A[0]);
    cv::Mat _X(M, 1, CV_64F, &X[0]);
    cv::Mat _B(N, 1, CV_64F, &B[0]);

    _B = _A*_X;

    // Add a bit of noise to B, probably not necessary for benchmarking
    for(size_t i=0; i < B.size(); i++) {
      B[i] += -1.0 + 2*rand()/(1.0+RAND_MAX);
    }

    //////////////////////////////////////////////////////

#ifdef TEST_OPENCV
    cout <<  RunOpenCV(N, M, A, B, iterations) << " ";
    //cout << "OpenCV: " << RunOpenCV(N, M, A, B, iterations) << endl;
#endif

#ifdef TEST_EIGEN
    //cout << "Eigen: " << RunEigen(N, M, A, B, iterations) << endl;
    cout <<  RunEigen(N, M, A, B, iterations) << " ";
#endif

#ifdef TEST_ARMA
    //cout << "Arma: " << RunArmadillo(N, M, A, B, iterations) << endl;
    cout << RunArmadillo(N, M, A, B, iterations) << " ";
#endif

#ifdef TEST_UTL
    //cout << "Arma: " << RunArmadillo(N, M, A, B, iterations) << endl;
    cout << RunUTL(N, M, A, B, iterations) << " ";
#endif

#ifdef TEST_GSL
    //cout << "GSL: " << RunGSL(N, M, A, B, iterations) << endl;
    cout << RunGSL(N, M, A, B, iterations) << " ";
#endif

#ifdef TEST_VNL
    //cout << "VNL: " << RunVNL(N, M, A, B, iterations) << endl;
    cout << RunVNL(N, M, A, B, iterations) << " ";
#endif

#ifdef TEST_VNL_BLAS
    //cout << "VNL Blas: " << RunVNLBlas(N, M, A, B, iterations) << endl;
    cout << RunVNLBlas(N, M, A, B, iterations) << " ";
#endif
  }

  cout << endl;

  return 0;
}

double TimeDiff(timeval t1, timeval t2)
{
  double t;

  t = (t2.tv_sec - t1.tv_sec) * 1000.0;      // sec to ms
  t += (t2.tv_usec - t1.tv_usec) / 1000.0;   // us to ms

  return t;
}

#ifdef TEST_OPENCV
double RunOpenCV(int N, int M, vector <double> &A, vector <double> &B, int iterations)
{
  // OpenCV has a nice constructor to allow using existing memory
  cv::Mat _A(N, M, CV_64F, &A[0]);
  cv::Mat _B(N, 1, CV_64F, &B[0]);
  cv::Mat X;
  timeval t1, t2;

  gettimeofday(&t1, NULL);
  for(int i=0; i < iterations; i++) {
    X = (_A.t()*_A).inv()*(_A.t()*_B);
  }
  gettimeofday(&t2, NULL);

  double d = TimeDiff(t1, t2);

  //for(int i=0; i < M; i++) {
  //    cout << "OpenCV X: " << X.at<double>(i,0) << endl;
  //}

  return d;
}
#endif

#ifdef TEST_EIGEN
double RunEigen(int N, int M, vector <double> &A, vector <double> &B, int iterations)
{
  Eigen::MatrixXd _A(N,M);
  Eigen::MatrixXd _B(N,1);
  Eigen::MatrixXd X(M,1);
  timeval t1, t2;

  // Copy the data, there's a better way right?
  // Eigen seems to have one but the example only shows using known matrix size at compiled time
  for(int i=0; i < N; i++) {
    _B(i,0) = B[i];
    for(int j=0; j < M; j++) {
      _A(i,j) = A[i*M+j];
    }
  }

  gettimeofday(&t1, NULL);
  for(int i=0; i < iterations; i++) {
    X = (_A.transpose()*_A).inverse()*(_A.transpose()*_B);
  }
  gettimeofday(&t2, NULL);

  double d = TimeDiff(t1, t2);

  //for(int i=0; i < M; i++) {
  //    cout << "Eigen X: " << X(i,0) << endl;
  //}

  return d;
}
#endif

#ifdef TEST_ARMA
double RunArmadillo(int N, int M, vector <double> &A, vector <double> &B, int iterations)
{
  arma::mat _A(N,M);
  arma::mat _B(N,1);
  arma::mat X(M,1);
  timeval t1, t2;

  // Copy the data, there's a better way right?
  for(int i=0; i < N; i++) {
    _B(i,0) = B[i];
    for(int j=0; j < M; j++) {
      _A(i,j) = A[i*M+j];
    }
  }

  gettimeofday(&t1, NULL);
  for(int i=0; i < iterations; i++) {
    X = arma::inv(arma::trans(_A)*_A)*(arma::trans(_A)*_B);
  }
  gettimeofday(&t2, NULL);

  double d = TimeDiff(t1, t2);

  //for(int i=0; i < M; i++) {
  //    cout << "Arma X: " << X(i,0) << endl;
  //}

  return d;
}
#endif

#ifdef TEST_UTL
double RunUTL(int N, int M, vector <double> &A, vector <double> &B, int iterations)
{
  // utl::NDArray<double,2> _A(N,M), _B(N,1), X(M,1), tmp1(M,M),tmp2(M,M),tmp3(M,1);
  utl::NDArray<double,2> _A(N,M), tmp, tmp2;
  utl::NDArray<double,1> _B(N), X(M), tmp1;
  timeval t1, t2;

  // Copy the data, there's a better way right?
  for(int i=0; i < N; i++) {
    _B[i] = B[i];
    for(int j=0; j < M; j++) {
      _A(i,j) = A[i*M+j];
    }
  }

  gettimeofday(&t1, NULL);
  for(int i=0; i < iterations; i++) {
    utl::ProductUtlMtM(_A, _A, tmp);
    // utl::ProductVnlXtX(_A, tmp);
    utl::ProductUtlMtv(_A, _B, tmp1);
    utl::PInverseSymmericMatrix(tmp, tmp2);
    utl::ProductUtlMv(tmp2, tmp1, X);

    // (_A.GetTranspose()*_A).PInverseSymmericMatrix(tmp);
    // X = tmp*(_A.GetTranspose()*_B);
    
    // utl::PInverseMatrix(_A, tmp);
    // utl::ProductUtlMv(tmp, _B, X);
  }
  gettimeofday(&t2, NULL);

  double d = TimeDiff(t1, t2);

  //for(int i=0; i < M; i++) {
  //    cout << "Arma X: " << X(i,0) << endl;
  //}

  return d;
}
#endif

#ifdef TEST_VNL
double RunVNL(int N, int M, vector <double> &A, vector <double> &B, int iterations)
{
  vnl_matrix<double> _A(N,M);
  vnl_vector<double> _B(N), X(M);
  timeval t1, t2;
  
  for(int i=0; i < N; i++) {
    _B[i] = B[i];
    for(int j=0; j < M; j++) {
      _A(i,j) = A[i*M+j];
    }
  }

  gettimeofday(&t1, NULL);
  for(int i=0; i < iterations; i++) {
    X = utl::GetVnlSymmericMatrixPInverse(_A.transpose()*_A)*(_A.transpose()*_B);
  }
  gettimeofday(&t2, NULL);

  double d = TimeDiff(t1, t2);

  return d;
}
#endif

#ifdef TEST_VNL_BLAS
double RunVNLBlas(int N, int M, vector <double> &A, vector <double> &B, int iterations)
{
  vnl_matrix<double> _A(N,M), tmp, tmp2;
  vnl_vector<double> _B(N), X(M), tmp1;
  timeval t1, t2;
  
  for(int i=0; i < N; i++) {
    _B[i] = B[i];
    for(int j=0; j < M; j++) {
      _A(i,j) = A[i*M+j];
    }
  }

  gettimeofday(&t1, NULL);
  for(int i=0; i < iterations; i++) {
    utl::ProductVnlMtM(_A, _A, tmp);
    // utl::ProductVnlXtX(_A, tmp);
    utl::ProductVnlMtv(_A, _B, tmp1);
    utl::PInverseSymmericVnlMatrix(tmp, tmp2);
    utl::ProductVnlMv(tmp2, tmp1, X);
  }
  gettimeofday(&t2, NULL);

  double d = TimeDiff(t1, t2);

  return d;
}
#endif

#ifdef TEST_GSL
double RunGSL(int N, int M, vector <double> &A, vector <double> &B, int iterations)
{
  // Use the cblas wrapper in GSL
  // Need some temporary member to do the calculation
  gsl_matrix *_A = gsl_matrix_alloc(N, M);
  gsl_matrix *_B = gsl_matrix_alloc(N, 1);
  gsl_matrix *At = gsl_matrix_alloc(M, N);
  gsl_matrix *AtA = gsl_matrix_alloc(M, M);
  gsl_matrix *At_B = gsl_matrix_alloc(M, 1);
  gsl_matrix *tmp = gsl_matrix_alloc(M, M);
  gsl_matrix *inv_AtA = gsl_matrix_alloc(M, M);
  gsl_matrix *inv_AtA_At = gsl_matrix_alloc(M, N);
  gsl_matrix *X = gsl_matrix_alloc(M, 1);
  gsl_permutation *p = gsl_permutation_alloc(M);

  timeval t1, t2;

  // Copy the data, there's a better way right?
  for(int i=0; i < N; i++) {
    gsl_matrix_set(_B, i, 0, B[i]);
    for(int j=0; j < M; j++) {
      gsl_matrix_set(_A, i, j, A[i*M+j]);
    }
  }

  gettimeofday(&t1, NULL);
  for(int i=0; i < iterations; i++) {
    gsl_matrix_transpose_memcpy(At, _A);

    gsl_blas_dgemm(CblasNoTrans, CblasNoTrans, 1.0, At, _A, 0.0, AtA);

    // Invert AtA
    int s;
    gsl_matrix_memcpy(tmp, AtA);
    gsl_linalg_LU_decomp(tmp, p, &s);
    gsl_linalg_LU_invert(tmp, p, inv_AtA); // doesn't support src and dst being the same

    //gsl_blas_dgemm(CblasNoTrans, CblasNoTrans, 1.0, inv_AtA, At, 0.0, inv_AtA_At);
    //gsl_blas_dgemm(CblasNoTrans, CblasNoTrans, 1.0, inv_AtA_At, _B, 0.0, X);
    gsl_blas_dgemm(CblasTrans, CblasNoTrans, 1.0, _A, _B, 0.0, At_B);
    gsl_blas_dgemm(CblasNoTrans, CblasNoTrans, 1.0, inv_AtA, At_B, 0.0, X);
  }
  gettimeofday(&t2, NULL);

  double d = TimeDiff(t1, t2);

  //for(int i=0; i < M; i++) {
  //    cout << "GSL X: " << gsl_matrix_get(X, i, 0) << endl;
  //}

  gsl_matrix_free(_A);
  gsl_matrix_free(_B);
  gsl_matrix_free(At);
  gsl_matrix_free(AtA);
  gsl_matrix_free(At_B);
  gsl_matrix_free(tmp);
  gsl_matrix_free(inv_AtA);
  gsl_matrix_free(inv_AtA_At);
  gsl_matrix_free(X);
  gsl_permutation_free(p);

  return d;
}
#endif

_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/insight-developers

Re: [ITK-dev] efficiency of vnl_matrix

Reply via email to