Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-26 Thread Dong, Eric
Brian,

> -Original Message-
> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of
> Brian J. Johnson
> Sent: Friday, October 27, 2017 4:48 AM
> To: Dong, Eric ; Laszlo Ersek ;
> edk2-devel@lists.01.org
> Cc: Ni, Ruiyu ; Paolo Bonzini 
> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
> AP initialization logic.
> 
> On 10/25/2017 08:13 PM, Dong, Eric wrote:
> > Laszlo,
> >
> >
> >> -Original Message-
> >> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf
> >> Of Laszlo Ersek
> >> Sent: Wednesday, October 25, 2017 11:08 PM
> >> To: Dong, Eric ; edk2-devel@lists.01.org
> >> Cc: Ni, Ruiyu ; Paolo Bonzini
> >> 
> >> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting
> >> for AP initialization logic.
> >>
> >> Hi Eric,
> >>
> >> On 10/25/17 07:42, Dong, Eric wrote:
> >>> Hi Laszlo,
> >>>
> >>> I think I have clear about your raised issues for Ovmf platform. In
> >>> this case, I
> >> think your platform not suit for this code change.  How about I do
> >> below change based on the new code:
> >>>
> >>> -  if (CpuMpData->CpuCount == 0) {
> >>>  TimedWaitForApFinish (
> >>>CpuMpData,
> >>>PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
> >>>PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
> >>>);
> >>> -  }
> >>>
> >>> After this change, for Ovmf can still set
> >>> PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old
> >> solution.
> >>> For the real platform, it can set a small value to follow the new
> >>> solution. For this new change, it just wait one more
> >>> PcdCpuApInitTimeOutInMicroSeconds timeout, should not have
> >> performance
> >>> impact (This time may also been cost in later while
> >>> (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or
> have
> >>> litter performance impact (not cover by the later loop).
> >> The suggested change will work for OVMF, thanks!
> >>
> >> (Just please don't forget to un-indent the TimedWaitForApFinish()
> >> call that will become unconditional again.)
> >>
> >> --o--
> >>
> >> Do you think Brian's concern should be investigated as well (perhaps
> >> in a separate Bugzilla entry)?
> >
> > As Jeff has mentioned, only Ovmf can know the exist Ap numbers before
> > send the Init-sipi-sipi.  So we don't know how many bits needed for this
> bitmap.
> > In case we can create the bitmap, we still don't know when all Aps have
> been
> >   found(If the begin map bit value same as finish map bit value, we
> > still can't get the conclusion that all Aps have been found, because
> > maybe other Aps not started yet).
> >
> 
> Right, physical platforms don't usually know the number of CPUs up front, so
> they still need a timeout.  That's unavoidable.  But we've seen cases where
> interrupts aren't getting delivered reliably, so not all the expected CPUs 
> start
> up (based on the core counts in the physical sockets, as known by the
> developing engineer, not the firmware.)  To debug this failure, it's very
> useful to have a list of exactly which CPUs did start.
> 
> Similarly, we've seen cases where a CPU starts, but crashes due to h/w or
> s/w bugs before signaling the BSP that it has finished.  With only a counter 
> to
> indicate how many CPUs are still running, it's impossible to tell which CPUs
> failed.  That makes debugging much more difficult.
> 
> The bitmaps would need to be sized according to the maximum number of
> CPUs supported by the platform (or the maximum APIC ID, depending on
> how they are indexed), like other data structures in the MpCpu code.
> 
> Bitmaps aren't the only possible implementation of course  My point is
> that it's useful to have a list of which CPUs started executing, and which
> finished.
> 

Seems this is not a must have item for this patch related issue. I think you 
can submit
A bugz for this debug feature.

Thanks,
Eric

> > Thanks,
> > Eric
> >>
> >> Because, I believe, the scheduling pattern that I described earlier
> >> could be possible on physical platforms as well, at least in theory:
> >>
> >>>> (2) After at least one AP has started running, the logic expects
&

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-26 Thread Brian J. Johnson

On 10/25/2017 08:13 PM, Dong, Eric wrote:

Laszlo,



-Original Message-
From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of
Laszlo Ersek
Sent: Wednesday, October 25, 2017 11:08 PM
To: Dong, Eric ; edk2-devel@lists.01.org
Cc: Ni, Ruiyu ; Paolo Bonzini 
Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
AP initialization logic.

Hi Eric,

On 10/25/17 07:42, Dong, Eric wrote:

Hi Laszlo,

I think I have clear about your raised issues for Ovmf platform. In this case, I

think your platform not suit for this code change.  How about I do below
change based on the new code:


-  if (CpuMpData->CpuCount == 0) {
 TimedWaitForApFinish (
   CpuMpData,
   PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
   PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
   );
-  }

After this change, for Ovmf can still set
PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old

solution.

For the real platform, it can set a small value to follow the new
solution. For this new change, it just wait one more
PcdCpuApInitTimeOutInMicroSeconds timeout, should not have

performance

impact (This time may also been cost in later while
(CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have
litter performance impact (not cover by the later loop).

The suggested change will work for OVMF, thanks!

(Just please don't forget to un-indent the TimedWaitForApFinish() call that
will become unconditional again.)

--o--

Do you think Brian's concern should be investigated as well (perhaps in a
separate Bugzilla entry)?


As Jeff has mentioned, only Ovmf can know the exist Ap numbers before
send the Init-sipi-sipi.  So we don't know how many bits needed for this bitmap.
In case we can create the bitmap, we still don't know when all Aps have been
  found(If the begin map bit value same as finish map bit value, we still can't 
get
the conclusion that all Aps have been found, because maybe other Aps not
started yet).



Right, physical platforms don't usually know the number of CPUs up 
front, so they still need a timeout.  That's unavoidable.  But we've 
seen cases where interrupts aren't getting delivered reliably, so not 
all the expected CPUs start up (based on the core counts in the physical 
sockets, as known by the developing engineer, not the firmware.)  To 
debug this failure, it's very useful to have a list of exactly which 
CPUs did start.


Similarly, we've seen cases where a CPU starts, but crashes due to h/w 
or s/w bugs before signaling the BSP that it has finished.  With only a 
counter to indicate how many CPUs are still running, it's impossible to 
tell which CPUs failed.  That makes debugging much more difficult.


The bitmaps would need to be sized according to the maximum number of 
CPUs supported by the platform (or the maximum APIC ID, depending on how 
they are indexed), like other data structures in the MpCpu code.


Bitmaps aren't the only possible implementation of course  My point 
is that it's useful to have a list of which CPUs started executing, and 
which finished.



Thanks,
Eric


Because, I believe, the scheduling pattern that I described earlier could be
possible on physical platforms as well, at least in theory:


(2) After at least one AP has started running, the logic expects
"NumApsExecuting" to monotonically grow for a while, and then to
monotonically decrease, back to zero. For example, if we have 1 BSP
and
7 APs, the BSP logic more or less expects the following values in
"NumApsExecuting":

1; 2; 3; 4; 5; 6; 7;
6; 5; 4; 3; 2; 1; 0


While this may be a valid expectation for physical processors (which
actually run in parallel, in the physical world), in a virtual machine, it is 
not

guaranteed.

Dependent on hypervisor scheduling artifacts, it is possible that,
say, three APs start up *and finish* before the remaining four APs
start up *at all*. The INIT-SIPI-SIPI marks all APs for execution /
scheduling in the hypervisor, yes, but the actual code execution may
commence a lot later. For example, the BSP may witness the following

series of values in "NumApsExecuting":


1; 2; 3;
2; 1; 0;
1; 2; 3; 4;
3; 2; 1; 0

and the BSP could think that there are 3 APs only, when it sees the
first 0 value.


I don't know if such a pattern is "likely", "unlikely", or "extremely unlikely" 
on
physical platforms. I think the pattern is "theoretically possible" at least.

I suggest that we go ahead with the small patch for the OVMF use case first,
and then open a new BZ if Brian thinks there's a real safety problem with the
code.



We also notice this theoretical problem when we implement this change, but
We think this is rarely to happen on physical platforms. We think the value for
the change is much bigger than not do this change because of this theo

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-25 Thread Dong, Eric
Laszlo,


> -Original Message-
> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of
> Laszlo Ersek
> Sent: Wednesday, October 25, 2017 11:08 PM
> To: Dong, Eric ; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu ; Paolo Bonzini 
> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
> AP initialization logic.
> 
> Hi Eric,
> 
> On 10/25/17 07:42, Dong, Eric wrote:
> > Hi Laszlo,
> >
> > I think I have clear about your raised issues for Ovmf platform. In this 
> > case, I
> think your platform not suit for this code change.  How about I do below
> change based on the new code:
> >
> > -  if (CpuMpData->CpuCount == 0) {
> > TimedWaitForApFinish (
> >   CpuMpData,
> >   PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
> >   PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
> >   );
> > -  }
> >
> > After this change, for Ovmf can still set
> > PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old
> solution.
> > For the real platform, it can set a small value to follow the new
> > solution. For this new change, it just wait one more
> > PcdCpuApInitTimeOutInMicroSeconds timeout, should not have
> performance
> > impact (This time may also been cost in later while
> > (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have
> > litter performance impact (not cover by the later loop).
> The suggested change will work for OVMF, thanks!
> 
> (Just please don't forget to un-indent the TimedWaitForApFinish() call that
> will become unconditional again.)
> 
> --o--
> 
> Do you think Brian's concern should be investigated as well (perhaps in a
> separate Bugzilla entry)?

As Jeff has mentioned, only Ovmf can know the exist Ap numbers before 
send the Init-sipi-sipi.  So we don't know how many bits needed for this bitmap.
In case we can create the bitmap, we still don't know when all Aps have been
 found(If the begin map bit value same as finish map bit value, we still can't 
get 
the conclusion that all Aps have been found, because maybe other Aps not
started yet).

Thanks,
Eric
> 
> Because, I believe, the scheduling pattern that I described earlier could be
> possible on physical platforms as well, at least in theory:
> 
> >> (2) After at least one AP has started running, the logic expects
> >> "NumApsExecuting" to monotonically grow for a while, and then to
> >> monotonically decrease, back to zero. For example, if we have 1 BSP
> >> and
> >> 7 APs, the BSP logic more or less expects the following values in
> >> "NumApsExecuting":
> >>
> >> 1; 2; 3; 4; 5; 6; 7;
> >> 6; 5; 4; 3; 2; 1; 0
> >>
> >>
> >> While this may be a valid expectation for physical processors (which
> >> actually run in parallel, in the physical world), in a virtual machine, it 
> >> is not
> guaranteed.
> >> Dependent on hypervisor scheduling artifacts, it is possible that,
> >> say, three APs start up *and finish* before the remaining four APs
> >> start up *at all*. The INIT-SIPI-SIPI marks all APs for execution /
> >> scheduling in the hypervisor, yes, but the actual code execution may
> >> commence a lot later. For example, the BSP may witness the following
> series of values in "NumApsExecuting":
> >>
> >> 1; 2; 3;
> >> 2; 1; 0;
> >> 1; 2; 3; 4;
> >> 3; 2; 1; 0
> >>
> >> and the BSP could think that there are 3 APs only, when it sees the
> >> first 0 value.
> 
> I don't know if such a pattern is "likely", "unlikely", or "extremely 
> unlikely" on
> physical platforms. I think the pattern is "theoretically possible" at least.
> 
> I suggest that we go ahead with the small patch for the OVMF use case first,
> and then open a new BZ if Brian thinks there's a real safety problem with the
> code.
> 

We also notice this theoretical problem when we implement this change, but 
We think this is rarely to happen on physical platforms. We think the value for 
the change is much bigger than not do this change because of this theoretical 
problem.

Thanks,
Eric
> ... I should note that, with the small patch that's going to fix the OVMF use
> case, the previous behavior will be restored for *all* platforms that continue
> using their previous PCD values.
> 
> The cumulative effect of commit 0594ec417c89 and the upcoming patch will
> be that a platform may significantly decrease
> "PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the
> "NumApsExecuting" loop will take the role of the preexisting
> TimedWaitForApFinish() call. If a platform doesn't want that, then it should
> keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as before, and then
> any implementation details in the "NumApsExecuting" loop will be irrelevant.
> 
> Thanks!
> Laszlo
> ___
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-25 Thread Laszlo Ersek
Hi Eric,

On 10/25/17 07:42, Dong, Eric wrote:
> Hi Laszlo,
> 
> I think I have clear about your raised issues for Ovmf platform. In this 
> case, I think your platform not suit for this code change.  How about I do 
> below change based on the new code:
> 
> -  if (CpuMpData->CpuCount == 0) {
> TimedWaitForApFinish (
>   CpuMpData,
>   PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
>   PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
>   );
> -  }
> 
> After this change, for Ovmf can still set
> PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old solution.
> For the real platform, it can set a small value to follow the new
> solution. For this new change, it just wait one more
> PcdCpuApInitTimeOutInMicroSeconds timeout, should not have
> performance impact (This time may also been cost in later while
> (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have
> litter performance impact (not cover by the later loop).
The suggested change will work for OVMF, thanks!

(Just please don't forget to un-indent the TimedWaitForApFinish() call
that will become unconditional again.)

--o--

Do you think Brian's concern should be investigated as well (perhaps in
a separate Bugzilla entry)?

Because, I believe, the scheduling pattern that I described earlier
could be possible on physical platforms as well, at least in theory:

>> (2) After at least one AP has started running, the logic expects
>> "NumApsExecuting" to monotonically grow for a while, and then to
>> monotonically decrease, back to zero. For example, if we have 1 BSP and
>> 7 APs, the BSP logic more or less expects the following values in
>> "NumApsExecuting":
>>
>> 1; 2; 3; 4; 5; 6; 7;
>> 6; 5; 4; 3; 2; 1; 0
>>
>>
>> While this may be a valid expectation for physical processors (which actually
>> run in parallel, in the physical world), in a virtual machine, it is not 
>> guaranteed.
>> Dependent on hypervisor scheduling artifacts, it is possible that, say, three
>> APs start up *and finish* before the remaining four APs start up *at all*. 
>> The
>> INIT-SIPI-SIPI marks all APs for execution / scheduling in the hypervisor, 
>> yes,
>> but the actual code execution may commence a lot later. For example, the
>> BSP may witness the following series of values in "NumApsExecuting":
>>
>> 1; 2; 3;
>> 2; 1; 0;
>> 1; 2; 3; 4;
>> 3; 2; 1; 0
>>
>> and the BSP could think that there are 3 APs only, when it sees the first 0
>> value.

I don't know if such a pattern is "likely", "unlikely", or "extremely
unlikely" on physical platforms. I think the pattern is "theoretically
possible" at least.

I suggest that we go ahead with the small patch for the OVMF use case
first, and then open a new BZ if Brian thinks there's a real safety
problem with the code.

... I should note that, with the small patch that's going to fix the
OVMF use case, the previous behavior will be restored for *all*
platforms that continue using their previous PCD values.

The cumulative effect of commit 0594ec417c89 and the upcoming patch will
be that a platform may significantly decrease
"PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the
"NumApsExecuting" loop will take the role of the preexisting
TimedWaitForApFinish() call. If a platform doesn't want that, then it
should keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as before,
and then any implementation details in the "NumApsExecuting" loop will
be irrelevant.

Thanks!
Laszlo
___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-24 Thread Dong, Eric
Hi Laszlo,

I think I have clear about your raised issues for Ovmf platform. In this case, 
I think your platform not suit for this code change.  How about I do below 
change based on the new code:

-  if (CpuMpData->CpuCount == 0) {
TimedWaitForApFinish (
  CpuMpData,
  PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
  PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
  );
-  }

After this change, for Ovmf can still set PcdCpuApInitTimeOutInMicroSeconds to 
MAX_UINT32 to keep old solution. For the real platform, it can set a small 
value to follow the new solution. For this new change, it just wait one more 
PcdCpuApInitTimeOutInMicroSeconds timeout, should not have performance impact 
(This time may also been cost in later while 
(CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have litter 
performance impact (not cover by the later loop).

Thanks,
Eric

> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: Wednesday, October 25, 2017 1:40 AM
> To: Dong, Eric ; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu ; Paolo Bonzini ;
> Jeff Fan 
> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
> AP initialization logic.
> 
> Hi Eric,
> 
> On 10/24/17 17:23, Dong, Eric wrote:
> > Laszlo,
> >
> >> -Original Message-
> >> From: Laszlo Ersek [mailto:ler...@redhat.com]
> >> Sent: Tuesday, October 24, 2017 6:16 PM
> >> To: Dong, Eric ; edk2-devel@lists.01.org
> >> Cc: Ni, Ruiyu ; Paolo Bonzini
> >> 
> >> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting
> >> for AP initialization logic.
> >>
> >> CC Paolo
> >>
> >> On 10/23/17 09:22, Eric Dong wrote:
> 
> >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> >>> index 976af1f..bdfe0d3 100644
> >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> >>> @@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equ
> LockLocation
> >> + 30h
> >>>  Cr3Location   equLockLocation + 34h
> >>>  InitFlagLocation  equLockLocation + 38h
> >>>  CpuInfoLocation   equLockLocation + 3Ch
> >>> +NumApsExecutingLocation   equLockLocation + 40h
> >>>
> >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> >>> index 1b9c6a6..2b6c27d 100644
> >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> >>> @@ -86,6 +86,12 @@ Flat32Start:   ; 
> >>> protected mode
> >> entry point
> >>>
> >>>  movesi, ebx
> >>>
> >>> +; Increment the number of APs executing here as early as possible
> >>> +; This is decremented in C code when AP is finished executing
> >>> +movedi, esi
> >>> +addedi, NumApsExecutingLocation
> >>> +lock inc   dword [edi]
> >>> +
> >>>  mov edi, esi
> >>>  add edi, EnableExecuteDisableLocation
> >>>  cmp byte [edi], 0
> >>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >>> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >>> index db923c9..48f930b 100644
> >>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >>> @@ -662,6 +662,7 @@ ApWakeupFunction (
> >>>  // AP finished executing C code
> >>>  //
> >>>  InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount);
> >>> +InterlockedDecrement ((UINT32 *)
> >>> + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting);
> >>>
> >>>  //
> >>>  // Place AP is specified loop mode @@ -765,6 +766,7 @@
> >>> FillExchangeInfoData (
> >>>
> >>>ExchangeInfo->CFunction   = (UINTN) ApWakeupFunction;
> >>>ExchangeInfo->ApIndex = 0;
> >>> +  ExchangeInfo->NumApsExecuting = 0;
> >>>ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag;
> >>>ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN)
> >> CpuMpData->CpuInfoInHob;
> >>>ExchangeIn

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-24 Thread Brian J. Johnson

On 10/24/2017 12:40 PM, Laszlo Ersek wrote:

Hi Eric,

On 10/24/17 17:23, Dong, Eric wrote:

Laszlo,


-Original Message-
From: Laszlo Ersek [mailto:ler...@redhat.com]
Sent: Tuesday, October 24, 2017 6:16 PM
To: Dong, Eric ; edk2-devel@lists.01.org
Cc: Ni, Ruiyu ; Paolo Bonzini 
Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
AP initialization logic.

CC Paolo

On 10/23/17 09:22, Eric Dong wrote:



diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
index 976af1f..bdfe0d3 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
@@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equLockLocation

+ 30h

  Cr3Location   equLockLocation + 34h
  InitFlagLocation  equLockLocation + 38h
  CpuInfoLocation   equLockLocation + 3Ch
+NumApsExecutingLocation   equLockLocation + 40h

diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
index 1b9c6a6..2b6c27d 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
@@ -86,6 +86,12 @@ Flat32Start:   ; protected 
mode

entry point


  movesi, ebx

+; Increment the number of APs executing here as early as possible
+; This is decremented in C code when AP is finished executing
+movedi, esi
+addedi, NumApsExecutingLocation
+lock inc   dword [edi]
+
  mov edi, esi
  add edi, EnableExecuteDisableLocation
  cmp byte [edi], 0
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index db923c9..48f930b 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -662,6 +662,7 @@ ApWakeupFunction (
  // AP finished executing C code
  //
  InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount);
+InterlockedDecrement ((UINT32 *)
+ &CpuMpData->MpCpuExchangeInfo->NumApsExecuting);

  //
  // Place AP is specified loop mode @@ -765,6 +766,7 @@
FillExchangeInfoData (

ExchangeInfo->CFunction   = (UINTN) ApWakeupFunction;
ExchangeInfo->ApIndex = 0;
+  ExchangeInfo->NumApsExecuting = 0;
ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag;
ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN)

CpuMpData->CpuInfoInHob;

ExchangeInfo->CpuMpData   = CpuMpData;
@@ -934,13 +936,19 @@ WakeUpAP (
  }
  if (CpuMpData->InitFlag == ApInitConfig) {
//
-  // Wait for all potential APs waken up in one specified period
+  // Wait for one potential AP waken up in one specified period
//
-  TimedWaitForApFinish (
-CpuMpData,
-PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
-PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
-);
+  if (CpuMpData->CpuCount == 0) {
+TimedWaitForApFinish (
+  CpuMpData,
+  PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
+  PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
+  );
+  }


I don't understand this change. The new comment says,

   Wait for *one* potential AP waken up in one specified period

However, the second parameter of TimedWaitForApFinish(), namely
"FinishedApLimit", gets the same value as before:

   PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1

It means that all of the (possible) APs are waited-for, just the same
as before.


[[Eric]] This patch changes the collect AP count logic, original
solution always waits for a specific time to let all APs start up. If
the input CPU number (PcdGet32(PcdCpuMaxLogicalProcessorNumber) - 1)
have been found or after a specific time(PcdGet32
(PcdCpuApInitTimeOutInMicroSeconds)). BSP will not wait anymore and use
CpuMpData->CpuCount as the found AP count.

New logic also wait for a specific time, but this time is smaller than
the original one. It just wait for the first AP(any AP) begin to do the
initialization( do CpuMpData->MpCpuExchangeInfo->NumApsExecuting++ means
it begin to do the initialization). When Ap finishes initialization, it
will do CpuMpData->MpCpuExchangeInfo->NumApsExecuting--. So after BSP
waits for a specific time at first, it just needs to check whether
CpuMpData->MpCpuExchangeInfo->NumApsExecuting == 0 to know whether all
Aps have finished initialization. Here we still use the original PCD
(PcdCpuApInitTimeOutInMicroSeconds) for the new time value.

When one AP do the initialization, it will also do
CpuMpData->CpuCount++. So here I check whether CpuMpData->CpuCount != 0
to know whether APs already begin to do the initialization. If yes, I
not need to do the time out waiting anymore, just check the
CpuMpData->MpCpuExchange

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-24 Thread Laszlo Ersek
Hi Eric,

On 10/24/17 17:23, Dong, Eric wrote:
> Laszlo,
>
>> -Original Message-
>> From: Laszlo Ersek [mailto:ler...@redhat.com]
>> Sent: Tuesday, October 24, 2017 6:16 PM
>> To: Dong, Eric ; edk2-devel@lists.01.org
>> Cc: Ni, Ruiyu ; Paolo Bonzini 
>> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
>> AP initialization logic.
>>
>> CC Paolo
>>
>> On 10/23/17 09:22, Eric Dong wrote:

>>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
>>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
>>> index 976af1f..bdfe0d3 100644
>>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
>>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
>>> @@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equLockLocation
>> + 30h
>>>  Cr3Location   equLockLocation + 34h
>>>  InitFlagLocation  equLockLocation + 38h
>>>  CpuInfoLocation   equLockLocation + 3Ch
>>> +NumApsExecutingLocation   equLockLocation + 40h
>>>
>>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>>> index 1b9c6a6..2b6c27d 100644
>>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>>> @@ -86,6 +86,12 @@ Flat32Start:   ; 
>>> protected mode
>> entry point
>>>
>>>  movesi, ebx
>>>
>>> +; Increment the number of APs executing here as early as possible
>>> +; This is decremented in C code when AP is finished executing
>>> +movedi, esi
>>> +addedi, NumApsExecutingLocation
>>> +lock inc   dword [edi]
>>> +
>>>  mov edi, esi
>>>  add edi, EnableExecuteDisableLocation
>>>  cmp byte [edi], 0
>>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>>> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>>> index db923c9..48f930b 100644
>>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>>> @@ -662,6 +662,7 @@ ApWakeupFunction (
>>>  // AP finished executing C code
>>>  //
>>>  InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount);
>>> +InterlockedDecrement ((UINT32 *)
>>> + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting);
>>>
>>>  //
>>>  // Place AP is specified loop mode @@ -765,6 +766,7 @@
>>> FillExchangeInfoData (
>>>
>>>ExchangeInfo->CFunction   = (UINTN) ApWakeupFunction;
>>>ExchangeInfo->ApIndex = 0;
>>> +  ExchangeInfo->NumApsExecuting = 0;
>>>ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag;
>>>ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN)
>> CpuMpData->CpuInfoInHob;
>>>ExchangeInfo->CpuMpData   = CpuMpData;
>>> @@ -934,13 +936,19 @@ WakeUpAP (
>>>  }
>>>  if (CpuMpData->InitFlag == ApInitConfig) {
>>>//
>>> -  // Wait for all potential APs waken up in one specified period
>>> +  // Wait for one potential AP waken up in one specified period
>>>//
>>> -  TimedWaitForApFinish (
>>> -CpuMpData,
>>> -PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
>>> -PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
>>> -);
>>> +  if (CpuMpData->CpuCount == 0) {
>>> +TimedWaitForApFinish (
>>> +  CpuMpData,
>>> +  PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
>>> +  PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
>>> +  );
>>> +  }
>>
>> I don't understand this change. The new comment says,
>>
>>   Wait for *one* potential AP waken up in one specified period
>>
>> However, the second parameter of TimedWaitForApFinish(), namely
>> "FinishedApLimit", gets the same value as before:
>>
>>   PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1
>>
>> It means that all of the (possible) APs are waited-for, just the same
>> as before.
>
> [[Eric]] This patch changes the collect AP count logic, original
> solution always waits for a specific time to let all APs start up. If
> the input CPU number (PcdGet32(PcdCpuMaxLogicalProcessorNumber) - 1)
> ha

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-24 Thread Dong, Eric
Laszlo,

Add more comments for TimedWaitForApFinish function in mail.

> -Original Message-
> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of
> Dong, Eric
> Sent: Tuesday, October 24, 2017 11:24 PM
> To: Laszlo Ersek ; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu ; Paolo Bonzini 
> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
> AP initialization logic.
> 
> Laszlo,
> 
> > -Original Message-
> > From: Laszlo Ersek [mailto:ler...@redhat.com]
> > Sent: Tuesday, October 24, 2017 6:16 PM
> > To: Dong, Eric ; edk2-devel@lists.01.org
> > Cc: Ni, Ruiyu ; Paolo Bonzini
> > 
> > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting
> > for AP initialization logic.
> >
> > CC Paolo
> >
> > On 10/23/17 09:22, Eric Dong wrote:
> > > Current logic always waiting for a specific value to collect all APs
> > > count. This logic may caused some platforms cost too much time to
> > > wait for time out.
> > > This patch add new logic to collect APs count. It adds new variable
> > > NumApsExecuting to detect whether all APs have finished initialization.
> > > Each AP let NumApsExecuting++ when begin to initialize itself and
> > > let
> > > NumApsExecuting-- when it finish the initialization. BSP base on
> > > whether NumApsExecuting == 0  to finished the collect AP process.
> > >
> > > Cc: Ruiyu Ni 
> > > Contributed-under: TianoCore Contribution Agreement 1.1
> > > Signed-off-by: Eric Dong 
> > > ---
> > >  UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc|  1 +
> > >  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  6 ++
> > >  UefiCpuPkg/Library/MpInitLib/MpLib.c   | 20 ++--
> > >  UefiCpuPkg/Library/MpInitLib/MpLib.h   |  1 +
> > >  UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc |  3 ++-
> > >  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  6 ++
> > >  6 files changed, 30 insertions(+), 7 deletions(-)
> >
> > I was out of office yesterday, and did not get a chance to comment on
> > this patch.
> >
> > In a virtualization guest, I see the following problem with the patch:
> >
> > VCPUs (virtual CPUs) may not be scheduled by the hypervisor similarly
> > to how physical CPUs behave on a physical board. It is possible that a
> > VCPU starts up and even finishes its initialization routine before
> > another VCPU starts running at all.
> >
> > Therefore the locked NumApsExecuting counter may hit zero, even
> > multiple times, before all APs have finished initializing.
> >
> > In OVMF, we query QEMU about the exact number of virtual processors,
> > in PlatformPei. So OVMF configures the logical processor count in
> > advance that MpInitLib has to wait for. Correspondingly, we also set
> > the timeout to "infinity".
> >
> > Please see the MaxCpuCountInitialization() function in following commit:
> >
> >   https://github.com/tianocore/edk2/commit/45a70db3c3a59
> >
> > In the past, we used to have AP initialization problems in OVMF due to
> > the VCPU scheduling artifacts I mention above. After commit
> > 45a70db3c3a59, things have been stable; it would be nice to keep that
> working.
> >
> > Please note that simply testing this patch on my end is not sufficient.
> > The AP init problems we used to face were sporadic and also specific
> > to the virtualization host systems (i.e., dependent on the physical
> > hardware and the host kernel).
> >
> > Furthermore:
> 
> [[Eric]] Will comments later if needed, need some investigation first.
> 
> >
> > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > > index 976af1f..bdfe0d3 100644
> > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equ
> LockLocation
> > + 30h
> > >  Cr3Location   equLockLocation + 34h
> > >  InitFlagLocation  equLockLocation + 38h
> > >  CpuInfoLocation   equLockLocation + 3Ch
> > > +NumApsExecutingLocation   equLockLocation + 40h
> > >
> > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > > index 1b9c6a6..2b6c27d 100644
> > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.na

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-24 Thread Dong, Eric
Laszlo,

> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: Tuesday, October 24, 2017 6:16 PM
> To: Dong, Eric ; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu ; Paolo Bonzini 
> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
> AP initialization logic.
> 
> CC Paolo
> 
> On 10/23/17 09:22, Eric Dong wrote:
> > Current logic always waiting for a specific value to collect all APs
> > count. This logic may caused some platforms cost too much time to wait
> > for time out.
> > This patch add new logic to collect APs count. It adds new variable
> > NumApsExecuting to detect whether all APs have finished initialization.
> > Each AP let NumApsExecuting++ when begin to initialize itself and let
> > NumApsExecuting-- when it finish the initialization. BSP base on
> > whether NumApsExecuting == 0  to finished the collect AP process.
> >
> > Cc: Ruiyu Ni 
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong 
> > ---
> >  UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc|  1 +
> >  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  6 ++
> >  UefiCpuPkg/Library/MpInitLib/MpLib.c   | 20 ++--
> >  UefiCpuPkg/Library/MpInitLib/MpLib.h   |  1 +
> >  UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc |  3 ++-
> >  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  6 ++
> >  6 files changed, 30 insertions(+), 7 deletions(-)
> 
> I was out of office yesterday, and did not get a chance to comment on this
> patch.
> 
> In a virtualization guest, I see the following problem with the patch:
> 
> VCPUs (virtual CPUs) may not be scheduled by the hypervisor similarly to how
> physical CPUs behave on a physical board. It is possible that a VCPU starts up
> and even finishes its initialization routine before another VCPU starts 
> running
> at all.
> 
> Therefore the locked NumApsExecuting counter may hit zero, even multiple
> times, before all APs have finished initializing.
> 
> In OVMF, we query QEMU about the exact number of virtual processors, in
> PlatformPei. So OVMF configures the logical processor count in advance that
> MpInitLib has to wait for. Correspondingly, we also set the timeout to
> "infinity".
> 
> Please see the MaxCpuCountInitialization() function in following commit:
> 
>   https://github.com/tianocore/edk2/commit/45a70db3c3a59
> 
> In the past, we used to have AP initialization problems in OVMF due to the
> VCPU scheduling artifacts I mention above. After commit 45a70db3c3a59,
> things have been stable; it would be nice to keep that working.
> 
> Please note that simply testing this patch on my end is not sufficient.
> The AP init problems we used to face were sporadic and also specific to the
> virtualization host systems (i.e., dependent on the physical hardware and the
> host kernel).
> 
> Furthermore:

[[Eric]] Will comments later if needed, need some investigation first.

> 
> > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > index 976af1f..bdfe0d3 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equLockLocation
> + 30h
> >  Cr3Location   equLockLocation + 34h
> >  InitFlagLocation  equLockLocation + 38h
> >  CpuInfoLocation   equLockLocation + 3Ch
> > +NumApsExecutingLocation   equLockLocation + 40h
> >
> > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > index 1b9c6a6..2b6c27d 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > @@ -86,6 +86,12 @@ Flat32Start:   ; 
> > protected mode
> entry point
> >
> >  movesi, ebx
> >
> > +; Increment the number of APs executing here as early as possible
> > +; This is decremented in C code when AP is finished executing
> > +movedi, esi
> > +addedi, NumApsExecutingLocation
> > +lock inc   dword [edi]
> > +
> >  mov edi, esi
> >  add edi, EnableExecuteDisableLocation
> >  cmp byte [edi], 0
> > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > index db923c9..48f930b 100644
> > --- a/UefiCpuPkg/Library/MpInit

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-24 Thread Laszlo Ersek
CC Paolo

On 10/23/17 09:22, Eric Dong wrote:
> Current logic always waiting for a specific value to collect all APs
> count. This logic may caused some platforms cost too much time to
> wait for time out.
> This patch add new logic to collect APs count. It adds new variable
> NumApsExecuting to detect whether all APs have finished initialization.
> Each AP let NumApsExecuting++ when begin to initialize itself and let
> NumApsExecuting-- when it finish the initialization. BSP base on whether
> NumApsExecuting == 0  to finished the collect AP process.
> 
> Cc: Ruiyu Ni 
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong 
> ---
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc|  1 +
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  6 ++
>  UefiCpuPkg/Library/MpInitLib/MpLib.c   | 20 ++--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h   |  1 +
>  UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc |  3 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  6 ++
>  6 files changed, 30 insertions(+), 7 deletions(-)

I was out of office yesterday, and did not get a chance to comment on
this patch.

In a virtualization guest, I see the following problem with the patch:

VCPUs (virtual CPUs) may not be scheduled by the hypervisor similarly to
how physical CPUs behave on a physical board. It is possible that a VCPU
starts up and even finishes its initialization routine before another
VCPU starts running at all.

Therefore the locked NumApsExecuting counter may hit zero, even multiple
times, before all APs have finished initializing.

In OVMF, we query QEMU about the exact number of virtual processors, in
PlatformPei. So OVMF configures the logical processor count in advance
that MpInitLib has to wait for. Correspondingly, we also set the timeout
to "infinity".

Please see the MaxCpuCountInitialization() function in following commit:

  https://github.com/tianocore/edk2/commit/45a70db3c3a59

In the past, we used to have AP initialization problems in OVMF due to
the VCPU scheduling artifacts I mention above. After commit
45a70db3c3a59, things have been stable; it would be nice to keep that
working.

Please note that simply testing this patch on my end is not sufficient.
The AP init problems we used to face were sporadic and also specific to
the virtualization host systems (i.e., dependent on the physical
hardware and the host kernel).

Furthermore:

> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> index 976af1f..bdfe0d3 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> @@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equLockLocation + 30h
>  Cr3Location   equLockLocation + 34h
>  InitFlagLocation  equLockLocation + 38h
>  CpuInfoLocation   equLockLocation + 3Ch
> +NumApsExecutingLocation   equLockLocation + 40h
>  
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 1b9c6a6..2b6c27d 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -86,6 +86,12 @@ Flat32Start:   ; protected 
> mode entry point
>  
>  movesi, ebx
>  
> +; Increment the number of APs executing here as early as possible
> +; This is decremented in C code when AP is finished executing
> +movedi, esi
> +addedi, NumApsExecutingLocation
> +lock inc   dword [edi]
> +
>  mov edi, esi
>  add edi, EnableExecuteDisableLocation
>  cmp byte [edi], 0
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c 
> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index db923c9..48f930b 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -662,6 +662,7 @@ ApWakeupFunction (
>  // AP finished executing C code
>  //
>  InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount);
> +InterlockedDecrement ((UINT32 *) 
> &CpuMpData->MpCpuExchangeInfo->NumApsExecuting);
>  
>  //
>  // Place AP is specified loop mode
> @@ -765,6 +766,7 @@ FillExchangeInfoData (
>  
>ExchangeInfo->CFunction   = (UINTN) ApWakeupFunction;
>ExchangeInfo->ApIndex = 0;
> +  ExchangeInfo->NumApsExecuting = 0;
>ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag;
>ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) 
> CpuMpData->CpuInfoInHob;
>ExchangeInfo->CpuMpData   = CpuMpData;
> @@ -934,13 +936,19 @@ WakeUpAP (
>  }
>  if (CpuMpData->InitFlag == ApInitConfig) {
>//
> -  // Wait for all potential APs waken up in one specified period
> +  // Wait for one potential AP waken up in one specified period
>//
> -  TimedWaitForApFinish (
>

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-24 Thread Ni, Ruiyu
Jeff,

I see. Thanks for mentioning that😊
Reviewed-by: Ruiyu Ni 

Do you mind to also give a r-b?


Thanks/Ray

> -Original Message-
> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of
> Fan Jeff
> Sent: Tuesday, October 24, 2017 2:27 PM
> To: Ni, Ruiyu ; Dong, Eric ; edk2-
> de...@lists.01.org
> Subject: [edk2] 答复: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
> AP initialization logic.
> 
> Ray,
> 
> MpCpuExchangeInfo was declared as volatile type. It may not be necessary
> to add volatile for NumApsExecuting.
> 
> Jeff
> 
> 发件人: Ni, Ruiyu<mailto:ruiyu...@intel.com>
> 发送时间: 2017年10月24日 14:03
> 收件人: Dong, Eric<mailto:eric.d...@intel.com>; edk2-
> de...@lists.01.org<mailto:edk2-devel@lists.01.org>
> 主题: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP
> initialization logic.
> 
> You need to have "volatile" for "UINTN  NumApsExecuting;".
> Otherwise, compiler may optimize the code to cause below code wait
> forever:
>   while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) {
> CpuPause();
>   }
> 
> 
> Thanks/Ray
> 
> > -Original Message-
> > From: Dong, Eric
> > Sent: Monday, October 23, 2017 3:23 PM
> > To: edk2-devel@lists.01.org
> > Cc: Ni, Ruiyu 
> > Subject: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP
> > initialization logic.
> >
> > Current logic always waiting for a specific value to collect all APs
> > count. This logic may caused some platforms cost too much time to wait for
> time out.
> > This patch add new logic to collect APs count. It adds new variable
> > NumApsExecuting to detect whether all APs have finished initialization.
> > Each AP let NumApsExecuting++ when begin to initialize itself and let
> > NumApsExecuting-- when it finish the initialization. BSP base on
> > whether NumApsExecuting == 0  to finished the collect AP process.
> >
> > Cc: Ruiyu Ni 
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong 
> > ---
> >  UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc|  1 +
> >  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  6 ++
> >  UefiCpuPkg/Library/MpInitLib/MpLib.c   | 20 ++--
> >  UefiCpuPkg/Library/MpInitLib/MpLib.h   |  1 +
> >  UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc |  3 ++-
> >  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  6 ++
> >  6 files changed, 30 insertions(+), 7 deletions(-)
> >
> > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > index 976af1f..bdfe0d3 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equLockLocation +
> > 30h
> >  Cr3Location   equLockLocation + 34h
> >  InitFlagLocation  equLockLocation + 38h
> >  CpuInfoLocation   equLockLocation + 3Ch
> > +NumApsExecutingLocation   equLockLocation + 40h
> >
> > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > index 1b9c6a6..2b6c27d 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > @@ -86,6 +86,12 @@ Flat32Start:   ; 
> > protected mode entry
> > point
> >
> >  movesi, ebx
> >
> > +; Increment the number of APs executing here as early as possible
> > +; This is decremented in C code when AP is finished executing
> > +movedi, esi
> > +addedi, NumApsExecutingLocation
> > +lock inc   dword [edi]
> > +
> >  mov edi, esi
> >  add edi, EnableExecuteDisableLocation
> >  cmp byte [edi], 0
> > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > index db923c9..48f930b 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > @@ -662,6 +662,7 @@ ApWakeupFunction (
> >  // AP finished executing C code
> >  //
> >  InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount);
> > +InterlockedDecrement ((UINT32 *)
> > + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting);
> >
> >  //
> >  // Place AP is specified loop mo

Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-23 Thread Ni, Ruiyu
You need to have "volatile" for "UINTN  NumApsExecuting;".
Otherwise, compiler may optimize the code to cause below code wait forever:
  while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) {
CpuPause();
  }


Thanks/Ray

> -Original Message-
> From: Dong, Eric
> Sent: Monday, October 23, 2017 3:23 PM
> To: edk2-devel@lists.01.org
> Cc: Ni, Ruiyu 
> Subject: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP
> initialization logic.
> 
> Current logic always waiting for a specific value to collect all APs count. 
> This
> logic may caused some platforms cost too much time to wait for time out.
> This patch add new logic to collect APs count. It adds new variable
> NumApsExecuting to detect whether all APs have finished initialization.
> Each AP let NumApsExecuting++ when begin to initialize itself and let
> NumApsExecuting-- when it finish the initialization. BSP base on whether
> NumApsExecuting == 0  to finished the collect AP process.
> 
> Cc: Ruiyu Ni 
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong 
> ---
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc|  1 +
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  6 ++
>  UefiCpuPkg/Library/MpInitLib/MpLib.c   | 20 ++--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h   |  1 +
>  UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc |  3 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  6 ++
>  6 files changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> index 976af1f..bdfe0d3 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> @@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equLockLocation +
> 30h
>  Cr3Location   equLockLocation + 34h
>  InitFlagLocation  equLockLocation + 38h
>  CpuInfoLocation   equLockLocation + 3Ch
> +NumApsExecutingLocation   equLockLocation + 40h
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 1b9c6a6..2b6c27d 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -86,6 +86,12 @@ Flat32Start:   ; protected 
> mode entry
> point
> 
>  movesi, ebx
> 
> +; Increment the number of APs executing here as early as possible
> +; This is decremented in C code when AP is finished executing
> +movedi, esi
> +addedi, NumApsExecutingLocation
> +lock inc   dword [edi]
> +
>  mov edi, esi
>  add edi, EnableExecuteDisableLocation
>  cmp byte [edi], 0
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index db923c9..48f930b 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -662,6 +662,7 @@ ApWakeupFunction (
>  // AP finished executing C code
>  //
>  InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount);
> +InterlockedDecrement ((UINT32 *)
> + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting);
> 
>  //
>  // Place AP is specified loop mode
> @@ -765,6 +766,7 @@ FillExchangeInfoData (
> 
>ExchangeInfo->CFunction   = (UINTN) ApWakeupFunction;
>ExchangeInfo->ApIndex = 0;
> +  ExchangeInfo->NumApsExecuting = 0;
>ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag;
>ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) CpuMpData-
> >CpuInfoInHob;
>ExchangeInfo->CpuMpData   = CpuMpData;
> @@ -934,13 +936,19 @@ WakeUpAP (
>  }
>  if (CpuMpData->InitFlag == ApInitConfig) {
>//
> -  // Wait for all potential APs waken up in one specified period
> +  // Wait for one potential AP waken up in one specified period
>//
> -  TimedWaitForApFinish (
> -CpuMpData,
> -PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
> -PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
> -);
> +  if (CpuMpData->CpuCount == 0) {
> +TimedWaitForApFinish (
> +  CpuMpData,
> +  PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
> +  PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
> +  );
> +  }
> +
> +  while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) {
> +CpuPause();
> +  }
>  } else {
>//
>// Wait all APs waken up if this is not the 1st broadcast of SIPI diff 
> --git
> a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> index e41d2db..d13d5c0 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> @@ -176,6 +176,7 @@ typedef struct {
>UINTN Cr3;
>UINTN Ini

[edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.

2017-10-23 Thread Eric Dong
Current logic always waiting for a specific value to collect all APs
count. This logic may caused some platforms cost too much time to
wait for time out.
This patch add new logic to collect APs count. It adds new variable
NumApsExecuting to detect whether all APs have finished initialization.
Each AP let NumApsExecuting++ when begin to initialize itself and let
NumApsExecuting-- when it finish the initialization. BSP base on whether
NumApsExecuting == 0  to finished the collect AP process.

Cc: Ruiyu Ni 
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong 
---
 UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc|  1 +
 UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  6 ++
 UefiCpuPkg/Library/MpInitLib/MpLib.c   | 20 ++--
 UefiCpuPkg/Library/MpInitLib/MpLib.h   |  1 +
 UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc |  3 ++-
 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  6 ++
 6 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc 
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
index 976af1f..bdfe0d3 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
@@ -40,4 +40,5 @@ EnableExecuteDisableLocation  equLockLocation + 30h
 Cr3Location   equLockLocation + 34h
 InitFlagLocation  equLockLocation + 38h
 CpuInfoLocation   equLockLocation + 3Ch
+NumApsExecutingLocation   equLockLocation + 40h
 
diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
index 1b9c6a6..2b6c27d 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
@@ -86,6 +86,12 @@ Flat32Start:   ; protected 
mode entry point
 
 movesi, ebx
 
+; Increment the number of APs executing here as early as possible
+; This is decremented in C code when AP is finished executing
+movedi, esi
+addedi, NumApsExecutingLocation
+lock inc   dword [edi]
+
 mov edi, esi
 add edi, EnableExecuteDisableLocation
 cmp byte [edi], 0
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c 
b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index db923c9..48f930b 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -662,6 +662,7 @@ ApWakeupFunction (
 // AP finished executing C code
 //
 InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount);
+InterlockedDecrement ((UINT32 *) 
&CpuMpData->MpCpuExchangeInfo->NumApsExecuting);
 
 //
 // Place AP is specified loop mode
@@ -765,6 +766,7 @@ FillExchangeInfoData (
 
   ExchangeInfo->CFunction   = (UINTN) ApWakeupFunction;
   ExchangeInfo->ApIndex = 0;
+  ExchangeInfo->NumApsExecuting = 0;
   ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag;
   ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) 
CpuMpData->CpuInfoInHob;
   ExchangeInfo->CpuMpData   = CpuMpData;
@@ -934,13 +936,19 @@ WakeUpAP (
 }
 if (CpuMpData->InitFlag == ApInitConfig) {
   //
-  // Wait for all potential APs waken up in one specified period
+  // Wait for one potential AP waken up in one specified period
   //
-  TimedWaitForApFinish (
-CpuMpData,
-PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
-PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
-);
+  if (CpuMpData->CpuCount == 0) {
+TimedWaitForApFinish (
+  CpuMpData,
+  PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
+  PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
+  );
+  }
+
+  while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) {
+CpuPause();
+  }
 } else {
   //
   // Wait all APs waken up if this is not the 1st broadcast of SIPI
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h 
b/UefiCpuPkg/Library/MpInitLib/MpLib.h
index e41d2db..d13d5c0 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
@@ -176,6 +176,7 @@ typedef struct {
   UINTN Cr3;
   UINTN InitFlag;
   CPU_INFO_IN_HOB   *CpuInfo;
+  UINTN NumApsExecuting;
   CPU_MP_DATA   *CpuMpData;
   UINTN InitializeFloatingPointUnitsAddress;
 } MP_CPU_EXCHANGE_INFO;
diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc 
b/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc
index 114f4e0..d255ca5 100644
--- a/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc
+++ b/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc
@@ -40,5 +40,6 @@ EnableExecuteDisableLocation  equLockLocation + 5Ch
 Cr3Location   equLockLocation + 64h
 InitFlagLocation  equLockLocation + 6Ch
 CpuInfoLocation   equLockLocatio