Re: [Insight-developers] itk performance numbers

Bradley Lowekamp Thu, 26 Jul 2012 08:53:47 -0700

Hello,

Well I did get to it before you:


http://review.source.kitware.com/#/c/6614/

I also uped the size of the image 100x  in your test, here is the current 
performance on my system:

System: victoria.nlm.nih.gov
Processor: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
 Serial #: 
    Cache: 32768
    Clock: 2794.27
    Cores: 12 cpus x 24 Cores = 288
OSName:     Mac OS X
  Release:  10.6.8
  Version:  10K549
  Platform: x86_64
  Operating System is 64 bit
ITK Version: 3.20.1
Virtual Memory: Total: 256 Available: 228
Physical Memory: Total:65536 Available: 58374
           Probe Name:        Count          Min           Mean         Stdev   
         Max        Total 
 MeanSquares_1_threads            20      0.344348      0.347567    0.00244733  
    0.352629       6.95134
 MeanSquares_2_threads            20      0.251223      0.300869     0.0179305  
    0.321404       6.01738
 MeanSquares_4_threads            20      0.215516      0.348677      0.173645  
    0.678274       6.97355
 MeanSquares_8_threads            20      0.138184      0.182681     0.0297812  
    0.237129       3.65362
System: victoria.nlm.nih.gov
Processor: 
 Serial #: 
    Cache: 32768
    Clock: 2930
    Cores: 12 cpus x 24 Cores = 288
OSName:     Mac OS X
  Release:  10.6.8
  Version:  10K549
  Platform: x86_64
  Operating System is 64 bit
ITK Version: 4.2.0
Virtual Memory: Total: 256 Available: 228
Physical Memory: Total:65536 Available: 58371
           Probe Name:        Count          Min           Mean         Stdev   
         Max        Total 
 MeanSquares_1_threads            20      0.382481      0.383342    0.00186954  
    0.391027       7.66685
 MeanSquares_2_threads            20      0.211908      0.335328     0.0777408  
    0.435574       6.70655
 MeanSquares_4_threads            20      0.271531      0.315688     0.0390751  
    0.385683       6.31377
 MeanSquares_8_threads            20      0.147544      0.192132     0.0299427  
    0.240976       3.84263


In the patch provided, it is implicitly done on assignment on a per-thread 
basis. What was most un-expected was when then allocation of the Jacobin was 
explicitly done out side the threaded part, the time when up by 50%! I presume 
that the sequential allocation, of the doubles in the master thread made the 
allocation sequentially, next to each other, and may be a more insidious form 
of false sharing. Below is the numbers from this run, notice the lack of speed 
up with more threads:

System: victoria.nlm.nih.gov
Processor: 
 Serial #: 
    Cache: 32768
    Clock: 2930
    Cores: 12 cpus x 24 Cores = 288
OSName:     Mac OS X
  Release:  10.6.8
  Version:  10K549
  Platform: x86_64
  Operating System is 64 bit
ITK Version: 4.2.0
Virtual Memory: Total: 256 Available: 226
Physical Memory: Total:65536 Available: 57091
           Probe Name:        Count          Min           Mean         Stdev   
         Max        Total 
 MeanSquares_1_threads            20      0.403931       0.40648    0.00213043  
     0.41389        8.1296
 MeanSquares_2_threads            20      0.243789      0.367603     0.0894637  
     0.65006       7.35206
 MeanSquares_4_threads            20      0.281336      0.354749     0.0431082  
    0.440161       7.09497
 MeanSquares_8_threads            20       0.24615      0.301576     0.0552998  
    0.446528       6.03151


Brad


On Jul 26, 2012, at 8:56 AM, Rupert Brooks wrote:

> Brad,
> 
> The false sharing issue is a good point - however, i dont think this is the 
> cause of the performance degradation.  This part of the class (m_Threader, 
> etc) has not changed since 3.20.  (I used the optimized metrics in my 3.20 
> builds, so its in Review/itkOptMeanSquares....) It also does not explain the 
> performance drop in single threaded mode.
> 
> Testing will tell...  Seems like a Friday afternoon project to me, unless 
> someone else gets there first.
> 
> Rupert
> 
> --------------------------------------------------------------
> Rupert Brooks
> [email protected]
> 
> 
> 
> On Wed, Jul 25, 2012 at 5:18 PM, Bradley Lowekamp <[email protected]> 
> wrote:
> Hello,
> 
> Continuing to glance at the class.... I also see the following member 
> variables for the MeanSquares class:
> 
>   MeasureType *   m_ThreaderMSE;
>   DerivativeType *m_ThreaderMSEDerivatives;
> 
> Where these are index by the thread ID and access simultaneously across the 
> threads causes the potential for False Sharing, which can be a MAJOR problem 
> with threaded algorithms.
> 
> I would think a good solution would be to create a per-thread data structure 
> consisting of the Jacobin, MeasureType, and DerivativeType, plus padding to 
> prevent false sharing, or equivalently assigning max data alignment to the 
> structure.
> 
> Rupert, Would like to take a stab at this fix?
> 
> Brad
> 
> 
> On Jul 25, 2012, at 4:31 PM, Rupert Brooks wrote:
> 
>> Sorry if this repeats - i just got a bounce from Insight Developers, so im 
>> trimming the message and resending....
>> --------------------------------------------------------------
>> Rupert Brooks
>> [email protected]
>> 
>> 
>> 
>> On Wed, Jul 25, 2012 at 4:12 PM, Rupert Brooks <[email protected]> 
>> wrote:
>> Aha.  Heres around line 183 of itkTranslationTransform.
>> 
>> // Compute the Jacobian in one position
>> template <class TScalarType, unsigned int NDimensions>
>> void
>> TranslationTransform<TScalarType, 
>> NDimensions>::ComputeJacobianWithRespectToParameters(
>>   const InputPointType &,
>>   JacobianType & jacobian) const
>> {
>>   // the Jacobian is constant for this transform, and it has already been
>>   // initialized in the constructor, so we just need to return it here.
>>   jacobian = this->m_IdentityJacobian;
>>   return;
>> }
>> 
>> Thats probably the culprit, although the root cause may be the reallocating 
>> of the jacobian every time through the loop.
>> 
>> Rupert
>> 
>> <snipped>
> 
> 

========================================================
Bradley Lowekamp  
Medical Science and Computing for
Office of High Performance Computing and Communications
National Library of Medicine 
[email protected]

_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-developers

Re: [Insight-developers] itk performance numbers

Reply via email to