Hi!
> >> + u64 p0, p1;
> >> int ret;
> >>
> >> atomic_set(&late_cpus_in, 0);
> >> atomic_set(&late_cpus_out, 0);
> >>
> >> + p0 = rdtsc_ordered();
> >> +
> >> ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
> >> +
> >> + p1 = rdtsc_ordered();
> >> +
> >> if (ret > 0)
> >> microcode_check();
> >>
> >> pr_info("Reload completed, microcode revision: 0x%x\n",
> >> boot_cpu_data.microcode);
> >>
> >> + pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0);
> >> +
> >> return ret;
> >> }
> >>
> >> We have used a machine with a broken microcode in BIOS and no microcode in
> >> initramfs (to bypass early loading).
> >>
> >> Here are the results for parallel loading (we made two measurements):
> >
> >> [ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02
> >> [ 18.201225] x86/CPU: CPU features have changed after loading microcode,
> >> but might not take effect.
> >> [ 18.201230] microcode: Reload completed, microcode revision: 0x200005e
> >> [ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff:
> >> 29889604
> >
> >> Here are the results of serial loading:
> >>
> >> [ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02
> >> [ 17.898365] x86/CPU: CPU features have changed after loading microcode,
> >> but might not take effect.
> >> [ 17.898370] microcode: Reload completed, microcode revision: 0x200005e
> >> [ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff:
> >> 842898034
> >>
> >> One can see that the difference is an order magnitude.
> >
> > Well, that's impressive, but it seems to finish 300 msec later? Where does
> > that difference
> > come from / how much real time do you gain by this?
>
> The difference comes from the large amount of cores/threads the machine has:
> 72 in this case, but there are machines with more. As the commit message says
> initially the microcode was applied serially one by one and now the microcode
> is updated in parallel on all cores.
>
> 300ms seems nothing but it is enough to cause disruption in some critical
> services (e.g. storage) - 300ms in which we do not execute anything on CPUs.
> Also this 300ms is increasing when the machine is fully loaded with guests.
>
Yes, but if you look at the dmesgs I quoted, paralel microcode update
actually finished 300msec _later_.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
signature.asc
Description: Digital signature

