Re: [Freesurfer] Parallel vs. Sequential part

2012-07-05 Thread R Edgar
On Tue, Jul 3, 2012 at 5:56 PM, Akio Yamamoto
yamam...@tkl.iis.u-tokyo.ac.jp wrote:

 Yes, as Richard pointed out, I just wanted to know the numbers for input
 to Amdahl's law, if you have already something, to figure out the maximum
 expected speedup using multiple processors/cores.

 As for improvements of em_reg, I'll try to split each transform as well as
 parallelize the energy evaluation.

Parallelising the energy evaluation is the lowest hanging fruit, and
is what happens in the 'slow' GPU version. But for highest
performance, I would convert the nested transform loops into a single
one, and farm those out between OpenMP threads (I wouldn't bother
trying nested parallelism of the energy evaluation, although you might
want to do some SSE tinkering). That is effectively what happens in
the 'fast' GPU version - you can use the same basic structure. But be
aware that the slightly different transforms which result can cause
you to converge to a different solution.

HTH,

Richard
___
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.



Re: [Freesurfer] Parallel vs. Sequential part

2012-07-03 Thread Bruce Fischl
Hi Akio

I think that the theoretical ratio is much bigger than what you would be 
able to achieve in practice. It's hard to compute exactly, but for example 
for em_reg it should be the number of samples (which are processed 
independently), so something on the order of 1000. For ca_reg it should be 
bigger, but it's a bit more complicated as things aren't independent.

cheers
Bruce


On Tue, 3 Jul 2012, Akio Yamamoto wrote:

 Freesurfer experts,

 I have been working on openmp parallelization of Freesurfer programs.

 I'd like to know the ratio of sequential processing parts (which are
 essentially impossible to be parallelized) to the total amount of
 processing in order to calculate the theoretical limit of performance
 improvement rate.

 Have you ever made an estimate of the ratio between sequential and
 parallel processing segments, especially for ca_reg and em_reg?

 Thanks,
 Akio

 ___
 Freesurfer mailing list
 Freesurfer@nmr.mgh.harvard.edu
 https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer



___
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.



Re: [Freesurfer] Parallel vs. Sequential part

2012-07-03 Thread R Edgar
On Tue, Jul 3, 2012 at 3:48 PM, Bruce Fischl fis...@nmr.mgh.harvard.edu wrote:

 I think that the theoretical ratio is much bigger than what you would be
 able to achieve in practice. It's hard to compute exactly, but for example
 for em_reg it should be the number of samples (which are processed
 independently), so something on the order of 1000. For ca_reg it should be
 bigger, but it's a bit more complicated as things aren't independent.

I _hope_ that the OpenMP version of mri_em_reg doesn't just try
parallelising the energy evaluation - it would be better to split each
transform off as a separate work item for handling by the available
threads (similar to what I did on the extra-fast GPU version). This
keeps the individual pieces of work big, which is good for CPUs

I think that what Akio's after are the numbers to plug into Amdahl's
Law. I don't think that these are easy to work out for anything in
Freesurfer. But you can put timers around the parallel sections, and
see what speed up you get on those.

HTH,

Richard
___
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.



Re: [Freesurfer] Parallel vs. Sequential part

2012-07-03 Thread Akio Yamamoto
Hi Bruce and Richard,

Yes, as Richard pointed out, I just wanted to know the numbers for input
to Amdahl's law, if you have already something, to figure out the maximum
expected speedup using multiple processors/cores.

As for improvements of em_reg, I'll try to split each transform as well as
parallelize the energy evaluation.

Thank you for your comments.

Akio

(2012/07/04 5:21), R Edgar wrote:
 On Tue, Jul 3, 2012 at 3:48 PM, Bruce Fischl fis...@nmr.mgh.harvard.edu 
 wrote:

 I think that the theoretical ratio is much bigger than what you would be
 able to achieve in practice. It's hard to compute exactly, but for example
 for em_reg it should be the number of samples (which are processed
 independently), so something on the order of 1000. For ca_reg it should be
 bigger, but it's a bit more complicated as things aren't independent.
 I _hope_ that the OpenMP version of mri_em_reg doesn't just try
 parallelising the energy evaluation - it would be better to split each
 transform off as a separate work item for handling by the available
 threads (similar to what I did on the extra-fast GPU version). This
 keeps the individual pieces of work big, which is good for CPUs

 I think that what Akio's after are the numbers to plug into Amdahl's
 Law. I don't think that these are easy to work out for anything in
 Freesurfer. But you can put timers around the parallel sections, and
 see what speed up you get on those.

 HTH,

 Richard
 ___
 Freesurfer mailing list
 Freesurfer@nmr.mgh.harvard.edu
 https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


 The information in this e-mail is intended only for the person to whom it is
 addressed. If you believe this e-mail was sent to you in error and the e-mail
 contains patient information, please contact the Partners Compliance HelpLine 
 at
 http://www.partners.org/complianceline . If the e-mail was sent to you in 
 error
 but does not contain patient information, please contact the sender and 
 properly
 dispose of the e-mail.





___
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


Re: [Freesurfer] Parallel vs. Sequential part

2012-07-03 Thread Bruce Fischl
thanks Akio

hopefully you will contribute the code back if/when you get it working?
Bruce

On 
Wed, 4 Jul 2012, Akio Yamamoto wrote:

 Hi Bruce and Richard,

 Yes, as Richard pointed out, I just wanted to know the numbers for input
 to Amdahl's law, if you have already something, to figure out the maximum
 expected speedup using multiple processors/cores.

 As for improvements of em_reg, I'll try to split each transform as well as
 parallelize the energy evaluation.

 Thank you for your comments.

 Akio

 (2012/07/04 5:21), R Edgar wrote:
 On Tue, Jul 3, 2012 at 3:48 PM, Bruce Fischl fis...@nmr.mgh.harvard.edu 
 wrote:
 
 I think that the theoretical ratio is much bigger than what you would be
 able to achieve in practice. It's hard to compute exactly, but for example
 for em_reg it should be the number of samples (which are processed
 independently), so something on the order of 1000. For ca_reg it should be
 bigger, but it's a bit more complicated as things aren't independent.
 I _hope_ that the OpenMP version of mri_em_reg doesn't just try
 parallelising the energy evaluation - it would be better to split each
 transform off as a separate work item for handling by the available
 threads (similar to what I did on the extra-fast GPU version). This
 keeps the individual pieces of work big, which is good for CPUs
 
 I think that what Akio's after are the numbers to plug into Amdahl's
 Law. I don't think that these are easy to work out for anything in
 Freesurfer. But you can put timers around the parallel sections, and
 see what speed up you get on those.
 
 HTH,
 
 Richard
 ___
 Freesurfer mailing list
 Freesurfer@nmr.mgh.harvard.edu
 https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
 
 
 The information in this e-mail is intended only for the person to whom it 
 is
 addressed. If you believe this e-mail was sent to you in error and the 
 e-mail
 contains patient information, please contact the Partners Compliance 
 HelpLine at
 http://www.partners.org/complianceline . If the e-mail was sent to you in 
 error
 but does not contain patient information, please contact the sender and 
 properly
 dispose of the e-mail.
 
 
 




___
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


[Freesurfer] Parallel vs. Sequential part

2012-07-02 Thread Akio Yamamoto
Freesurfer experts,

I have been working on openmp parallelization of Freesurfer programs.

I'd like to know the ratio of sequential processing parts (which are
essentially impossible to be parallelized) to the total amount of
processing in order to calculate the theoretical limit of performance
improvement rate.

Have you ever made an estimate of the ratio between sequential and
parallel processing segments, especially for ca_reg and em_reg?

Thanks,
Akio

___
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.