On Tue, Jul 3, 2012 at 5:56 PM, Akio Yamamoto
<yamam...@tkl.iis.u-tokyo.ac.jp> wrote:

> Yes, as Richard pointed out, I just wanted to know the numbers for input
> to Amdahl's law, if you have already something, to figure out the maximum
> expected speedup using multiple processors/cores.
>
> As for improvements of em_reg, I'll try to split each transform as well as
> parallelize the energy evaluation.

Parallelising the energy evaluation is the lowest hanging fruit, and
is what happens in the 'slow' GPU version. But for highest
performance, I would convert the nested transform loops into a single
one, and farm those out between OpenMP threads (I wouldn't bother
trying nested parallelism of the energy evaluation, although you might
want to do some SSE tinkering). That is effectively what happens in
the 'fast' GPU version - you can use the same basic structure. But be
aware that the slightly different transforms which result can cause
you to converge to a different solution.

HTH,

Richard
_______________________________________________
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Reply via email to