Re: [Freesurfer] Parallel vs. Sequential part
On Tue, Jul 3, 2012 at 5:56 PM, Akio Yamamoto yamam...@tkl.iis.u-tokyo.ac.jp wrote: Yes, as Richard pointed out, I just wanted to know the numbers for input to Amdahl's law, if you have already something, to figure out the maximum expected speedup using multiple processors/cores. As for improvements of em_reg, I'll try to split each transform as well as parallelize the energy evaluation. Parallelising the energy evaluation is the lowest hanging fruit, and is what happens in the 'slow' GPU version. But for highest performance, I would convert the nested transform loops into a single one, and farm those out between OpenMP threads (I wouldn't bother trying nested parallelism of the energy evaluation, although you might want to do some SSE tinkering). That is effectively what happens in the 'fast' GPU version - you can use the same basic structure. But be aware that the slightly different transforms which result can cause you to converge to a different solution. HTH, Richard ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
Re: [Freesurfer] Parallel vs. Sequential part
Hi Akio I think that the theoretical ratio is much bigger than what you would be able to achieve in practice. It's hard to compute exactly, but for example for em_reg it should be the number of samples (which are processed independently), so something on the order of 1000. For ca_reg it should be bigger, but it's a bit more complicated as things aren't independent. cheers Bruce On Tue, 3 Jul 2012, Akio Yamamoto wrote: Freesurfer experts, I have been working on openmp parallelization of Freesurfer programs. I'd like to know the ratio of sequential processing parts (which are essentially impossible to be parallelized) to the total amount of processing in order to calculate the theoretical limit of performance improvement rate. Have you ever made an estimate of the ratio between sequential and parallel processing segments, especially for ca_reg and em_reg? Thanks, Akio ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
Re: [Freesurfer] Parallel vs. Sequential part
On Tue, Jul 3, 2012 at 3:48 PM, Bruce Fischl fis...@nmr.mgh.harvard.edu wrote: I think that the theoretical ratio is much bigger than what you would be able to achieve in practice. It's hard to compute exactly, but for example for em_reg it should be the number of samples (which are processed independently), so something on the order of 1000. For ca_reg it should be bigger, but it's a bit more complicated as things aren't independent. I _hope_ that the OpenMP version of mri_em_reg doesn't just try parallelising the energy evaluation - it would be better to split each transform off as a separate work item for handling by the available threads (similar to what I did on the extra-fast GPU version). This keeps the individual pieces of work big, which is good for CPUs I think that what Akio's after are the numbers to plug into Amdahl's Law. I don't think that these are easy to work out for anything in Freesurfer. But you can put timers around the parallel sections, and see what speed up you get on those. HTH, Richard ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
Re: [Freesurfer] Parallel vs. Sequential part
Hi Bruce and Richard, Yes, as Richard pointed out, I just wanted to know the numbers for input to Amdahl's law, if you have already something, to figure out the maximum expected speedup using multiple processors/cores. As for improvements of em_reg, I'll try to split each transform as well as parallelize the energy evaluation. Thank you for your comments. Akio (2012/07/04 5:21), R Edgar wrote: On Tue, Jul 3, 2012 at 3:48 PM, Bruce Fischl fis...@nmr.mgh.harvard.edu wrote: I think that the theoretical ratio is much bigger than what you would be able to achieve in practice. It's hard to compute exactly, but for example for em_reg it should be the number of samples (which are processed independently), so something on the order of 1000. For ca_reg it should be bigger, but it's a bit more complicated as things aren't independent. I _hope_ that the OpenMP version of mri_em_reg doesn't just try parallelising the energy evaluation - it would be better to split each transform off as a separate work item for handling by the available threads (similar to what I did on the extra-fast GPU version). This keeps the individual pieces of work big, which is good for CPUs I think that what Akio's after are the numbers to plug into Amdahl's Law. I don't think that these are easy to work out for anything in Freesurfer. But you can put timers around the parallel sections, and see what speed up you get on those. HTH, Richard ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
Re: [Freesurfer] Parallel vs. Sequential part
thanks Akio hopefully you will contribute the code back if/when you get it working? Bruce On Wed, 4 Jul 2012, Akio Yamamoto wrote: Hi Bruce and Richard, Yes, as Richard pointed out, I just wanted to know the numbers for input to Amdahl's law, if you have already something, to figure out the maximum expected speedup using multiple processors/cores. As for improvements of em_reg, I'll try to split each transform as well as parallelize the energy evaluation. Thank you for your comments. Akio (2012/07/04 5:21), R Edgar wrote: On Tue, Jul 3, 2012 at 3:48 PM, Bruce Fischl fis...@nmr.mgh.harvard.edu wrote: I think that the theoretical ratio is much bigger than what you would be able to achieve in practice. It's hard to compute exactly, but for example for em_reg it should be the number of samples (which are processed independently), so something on the order of 1000. For ca_reg it should be bigger, but it's a bit more complicated as things aren't independent. I _hope_ that the OpenMP version of mri_em_reg doesn't just try parallelising the energy evaluation - it would be better to split each transform off as a separate work item for handling by the available threads (similar to what I did on the extra-fast GPU version). This keeps the individual pieces of work big, which is good for CPUs I think that what Akio's after are the numbers to plug into Amdahl's Law. I don't think that these are easy to work out for anything in Freesurfer. But you can put timers around the parallel sections, and see what speed up you get on those. HTH, Richard ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
[Freesurfer] Parallel vs. Sequential part
Freesurfer experts, I have been working on openmp parallelization of Freesurfer programs. I'd like to know the ratio of sequential processing parts (which are essentially impossible to be parallelized) to the total amount of processing in order to calculate the theoretical limit of performance improvement rate. Have you ever made an estimate of the ratio between sequential and parallel processing segments, especially for ca_reg and em_reg? Thanks, Akio ___ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.