Laszlo,

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of
> Laszlo Ersek
> Sent: Wednesday, October 25, 2017 11:08 PM
> To: Dong, Eric <eric.d...@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu...@intel.com>; Paolo Bonzini <pbonz...@redhat.com>
> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
> AP initialization logic.
> 
> Hi Eric,
> 
> On 10/25/17 07:42, Dong, Eric wrote:
> > Hi Laszlo,
> >
> > I think I have clear about your raised issues for Ovmf platform. In this 
> > case, I
> think your platform not suit for this code change.  How about I do below
> change based on the new code:
> >
> > -      if (CpuMpData->CpuCount == 0) {
> >         TimedWaitForApFinish (
> >           CpuMpData,
> >           PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
> >           PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
> >           );
> > -      }
> >
> > After this change, for Ovmf can still set
> > PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old
> solution.
> > For the real platform, it can set a small value to follow the new
> > solution. For this new change, it just wait one more
> > PcdCpuApInitTimeOutInMicroSeconds timeout, should not have
> performance
> > impact (This time may also been cost in later while
> > (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have
> > litter performance impact (not cover by the later loop).
> The suggested change will work for OVMF, thanks!
> 
> (Just please don't forget to un-indent the TimedWaitForApFinish() call that
> will become unconditional again.)
> 
> --o--
> 
> Do you think Brian's concern should be investigated as well (perhaps in a
> separate Bugzilla entry)?

As Jeff has mentioned, only Ovmf can know the exist Ap numbers before 
send the Init-sipi-sipi.  So we don't know how many bits needed for this bitmap.
In case we can create the bitmap, we still don't know when all Aps have been
 found(If the begin map bit value same as finish map bit value, we still can't 
get 
the conclusion that all Aps have been found, because maybe other Aps not
started yet).

Thanks,
Eric
> 
> Because, I believe, the scheduling pattern that I described earlier could be
> possible on physical platforms as well, at least in theory:
> 
> >> (2) After at least one AP has started running, the logic expects
> >> "NumApsExecuting" to monotonically grow for a while, and then to
> >> monotonically decrease, back to zero. For example, if we have 1 BSP
> >> and
> >> 7 APs, the BSP logic more or less expects the following values in
> >> "NumApsExecuting":
> >>
> >> 1; 2; 3; 4; 5; 6; 7;
> >> 6; 5; 4; 3; 2; 1; 0
> >>
> >>
> >> While this may be a valid expectation for physical processors (which
> >> actually run in parallel, in the physical world), in a virtual machine, it 
> >> is not
> guaranteed.
> >> Dependent on hypervisor scheduling artifacts, it is possible that,
> >> say, three APs start up *and finish* before the remaining four APs
> >> start up *at all*. The INIT-SIPI-SIPI marks all APs for execution /
> >> scheduling in the hypervisor, yes, but the actual code execution may
> >> commence a lot later. For example, the BSP may witness the following
> series of values in "NumApsExecuting":
> >>
> >> 1; 2; 3;
> >> 2; 1; 0;
> >> 1; 2; 3; 4;
> >> 3; 2; 1; 0
> >>
> >> and the BSP could think that there are 3 APs only, when it sees the
> >> first 0 value.
> 
> I don't know if such a pattern is "likely", "unlikely", or "extremely 
> unlikely" on
> physical platforms. I think the pattern is "theoretically possible" at least.
> 
> I suggest that we go ahead with the small patch for the OVMF use case first,
> and then open a new BZ if Brian thinks there's a real safety problem with the
> code.
> 

We also notice this theoretical problem when we implement this change, but 
We think this is rarely to happen on physical platforms. We think the value for 
the change is much bigger than not do this change because of this theoretical 
problem.

Thanks,
Eric
> ... I should note that, with the small patch that's going to fix the OVMF use
> case, the previous behavior will be restored for *all* platforms that continue
> using their previous PCD values.
> 
> The cumulative effect of commit 0594ec417c89 and the upcoming patch will
> be that a platform may significantly decrease
> "PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the
> "NumApsExecuting" loop will take the role of the preexisting
> TimedWaitForApFinish() call. If a platform doesn't want that, then it should
> keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as before, and then
> any implementation details in the "NumApsExecuting" loop will be irrelevant.
> 
> Thanks!
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to