Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Brian, > -Original Message- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Brian J. Johnson > Sent: Friday, October 27, 2017 4:48 AM > To: Dong, Eric ; Laszlo Ersek ; > edk2-devel@lists.01.org > Cc: Ni, Ruiyu ; Paolo Bonzini > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for > AP initialization logic. > > On 10/25/2017 08:13 PM, Dong, Eric wrote: > > Laszlo, > > > > > >> -Original Message- > >> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf > >> Of Laszlo Ersek > >> Sent: Wednesday, October 25, 2017 11:08 PM > >> To: Dong, Eric ; edk2-devel@lists.01.org > >> Cc: Ni, Ruiyu ; Paolo Bonzini > >> > >> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting > >> for AP initialization logic. > >> > >> Hi Eric, > >> > >> On 10/25/17 07:42, Dong, Eric wrote: > >>> Hi Laszlo, > >>> > >>> I think I have clear about your raised issues for Ovmf platform. In > >>> this case, I > >> think your platform not suit for this code change. How about I do > >> below change based on the new code: > >>> > >>> - if (CpuMpData->CpuCount == 0) { > >>> TimedWaitForApFinish ( > >>>CpuMpData, > >>>PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, > >>>PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) > >>>); > >>> - } > >>> > >>> After this change, for Ovmf can still set > >>> PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old > >> solution. > >>> For the real platform, it can set a small value to follow the new > >>> solution. For this new change, it just wait one more > >>> PcdCpuApInitTimeOutInMicroSeconds timeout, should not have > >> performance > >>> impact (This time may also been cost in later while > >>> (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or > have > >>> litter performance impact (not cover by the later loop). > >> The suggested change will work for OVMF, thanks! > >> > >> (Just please don't forget to un-indent the TimedWaitForApFinish() > >> call that will become unconditional again.) > >> > >> --o-- > >> > >> Do you think Brian's concern should be investigated as well (perhaps > >> in a separate Bugzilla entry)? > > > > As Jeff has mentioned, only Ovmf can know the exist Ap numbers before > > send the Init-sipi-sipi. So we don't know how many bits needed for this > bitmap. > > In case we can create the bitmap, we still don't know when all Aps have > been > > found(If the begin map bit value same as finish map bit value, we > > still can't get the conclusion that all Aps have been found, because > > maybe other Aps not started yet). > > > > Right, physical platforms don't usually know the number of CPUs up front, so > they still need a timeout. That's unavoidable. But we've seen cases where > interrupts aren't getting delivered reliably, so not all the expected CPUs > start > up (based on the core counts in the physical sockets, as known by the > developing engineer, not the firmware.) To debug this failure, it's very > useful to have a list of exactly which CPUs did start. > > Similarly, we've seen cases where a CPU starts, but crashes due to h/w or > s/w bugs before signaling the BSP that it has finished. With only a counter > to > indicate how many CPUs are still running, it's impossible to tell which CPUs > failed. That makes debugging much more difficult. > > The bitmaps would need to be sized according to the maximum number of > CPUs supported by the platform (or the maximum APIC ID, depending on > how they are indexed), like other data structures in the MpCpu code. > > Bitmaps aren't the only possible implementation of course My point is > that it's useful to have a list of which CPUs started executing, and which > finished. > Seems this is not a must have item for this patch related issue. I think you can submit A bugz for this debug feature. Thanks, Eric > > Thanks, > > Eric > >> > >> Because, I believe, the scheduling pattern that I described earlier > >> could be possible on physical platforms as well, at least in theory: > >> > >>>> (2) After at least one AP has started running, the logic expects &
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
On 10/25/2017 08:13 PM, Dong, Eric wrote: Laszlo, -Original Message- From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of Laszlo Ersek Sent: Wednesday, October 25, 2017 11:08 PM To: Dong, Eric ; edk2-devel@lists.01.org Cc: Ni, Ruiyu ; Paolo Bonzini Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic. Hi Eric, On 10/25/17 07:42, Dong, Eric wrote: Hi Laszlo, I think I have clear about your raised issues for Ovmf platform. In this case, I think your platform not suit for this code change. How about I do below change based on the new code: - if (CpuMpData->CpuCount == 0) { TimedWaitForApFinish ( CpuMpData, PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) ); - } After this change, for Ovmf can still set PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old solution. For the real platform, it can set a small value to follow the new solution. For this new change, it just wait one more PcdCpuApInitTimeOutInMicroSeconds timeout, should not have performance impact (This time may also been cost in later while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have litter performance impact (not cover by the later loop). The suggested change will work for OVMF, thanks! (Just please don't forget to un-indent the TimedWaitForApFinish() call that will become unconditional again.) --o-- Do you think Brian's concern should be investigated as well (perhaps in a separate Bugzilla entry)? As Jeff has mentioned, only Ovmf can know the exist Ap numbers before send the Init-sipi-sipi. So we don't know how many bits needed for this bitmap. In case we can create the bitmap, we still don't know when all Aps have been found(If the begin map bit value same as finish map bit value, we still can't get the conclusion that all Aps have been found, because maybe other Aps not started yet). Right, physical platforms don't usually know the number of CPUs up front, so they still need a timeout. That's unavoidable. But we've seen cases where interrupts aren't getting delivered reliably, so not all the expected CPUs start up (based on the core counts in the physical sockets, as known by the developing engineer, not the firmware.) To debug this failure, it's very useful to have a list of exactly which CPUs did start. Similarly, we've seen cases where a CPU starts, but crashes due to h/w or s/w bugs before signaling the BSP that it has finished. With only a counter to indicate how many CPUs are still running, it's impossible to tell which CPUs failed. That makes debugging much more difficult. The bitmaps would need to be sized according to the maximum number of CPUs supported by the platform (or the maximum APIC ID, depending on how they are indexed), like other data structures in the MpCpu code. Bitmaps aren't the only possible implementation of course My point is that it's useful to have a list of which CPUs started executing, and which finished. Thanks, Eric Because, I believe, the scheduling pattern that I described earlier could be possible on physical platforms as well, at least in theory: (2) After at least one AP has started running, the logic expects "NumApsExecuting" to monotonically grow for a while, and then to monotonically decrease, back to zero. For example, if we have 1 BSP and 7 APs, the BSP logic more or less expects the following values in "NumApsExecuting": 1; 2; 3; 4; 5; 6; 7; 6; 5; 4; 3; 2; 1; 0 While this may be a valid expectation for physical processors (which actually run in parallel, in the physical world), in a virtual machine, it is not guaranteed. Dependent on hypervisor scheduling artifacts, it is possible that, say, three APs start up *and finish* before the remaining four APs start up *at all*. The INIT-SIPI-SIPI marks all APs for execution / scheduling in the hypervisor, yes, but the actual code execution may commence a lot later. For example, the BSP may witness the following series of values in "NumApsExecuting": 1; 2; 3; 2; 1; 0; 1; 2; 3; 4; 3; 2; 1; 0 and the BSP could think that there are 3 APs only, when it sees the first 0 value. I don't know if such a pattern is "likely", "unlikely", or "extremely unlikely" on physical platforms. I think the pattern is "theoretically possible" at least. I suggest that we go ahead with the small patch for the OVMF use case first, and then open a new BZ if Brian thinks there's a real safety problem with the code. We also notice this theoretical problem when we implement this change, but We think this is rarely to happen on physical platforms. We think the value for the change is much bigger than not do this change because of this theo
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Laszlo, > -Original Message- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Laszlo Ersek > Sent: Wednesday, October 25, 2017 11:08 PM > To: Dong, Eric ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu ; Paolo Bonzini > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for > AP initialization logic. > > Hi Eric, > > On 10/25/17 07:42, Dong, Eric wrote: > > Hi Laszlo, > > > > I think I have clear about your raised issues for Ovmf platform. In this > > case, I > think your platform not suit for this code change. How about I do below > change based on the new code: > > > > - if (CpuMpData->CpuCount == 0) { > > TimedWaitForApFinish ( > > CpuMpData, > > PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, > > PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) > > ); > > - } > > > > After this change, for Ovmf can still set > > PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old > solution. > > For the real platform, it can set a small value to follow the new > > solution. For this new change, it just wait one more > > PcdCpuApInitTimeOutInMicroSeconds timeout, should not have > performance > > impact (This time may also been cost in later while > > (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have > > litter performance impact (not cover by the later loop). > The suggested change will work for OVMF, thanks! > > (Just please don't forget to un-indent the TimedWaitForApFinish() call that > will become unconditional again.) > > --o-- > > Do you think Brian's concern should be investigated as well (perhaps in a > separate Bugzilla entry)? As Jeff has mentioned, only Ovmf can know the exist Ap numbers before send the Init-sipi-sipi. So we don't know how many bits needed for this bitmap. In case we can create the bitmap, we still don't know when all Aps have been found(If the begin map bit value same as finish map bit value, we still can't get the conclusion that all Aps have been found, because maybe other Aps not started yet). Thanks, Eric > > Because, I believe, the scheduling pattern that I described earlier could be > possible on physical platforms as well, at least in theory: > > >> (2) After at least one AP has started running, the logic expects > >> "NumApsExecuting" to monotonically grow for a while, and then to > >> monotonically decrease, back to zero. For example, if we have 1 BSP > >> and > >> 7 APs, the BSP logic more or less expects the following values in > >> "NumApsExecuting": > >> > >> 1; 2; 3; 4; 5; 6; 7; > >> 6; 5; 4; 3; 2; 1; 0 > >> > >> > >> While this may be a valid expectation for physical processors (which > >> actually run in parallel, in the physical world), in a virtual machine, it > >> is not > guaranteed. > >> Dependent on hypervisor scheduling artifacts, it is possible that, > >> say, three APs start up *and finish* before the remaining four APs > >> start up *at all*. The INIT-SIPI-SIPI marks all APs for execution / > >> scheduling in the hypervisor, yes, but the actual code execution may > >> commence a lot later. For example, the BSP may witness the following > series of values in "NumApsExecuting": > >> > >> 1; 2; 3; > >> 2; 1; 0; > >> 1; 2; 3; 4; > >> 3; 2; 1; 0 > >> > >> and the BSP could think that there are 3 APs only, when it sees the > >> first 0 value. > > I don't know if such a pattern is "likely", "unlikely", or "extremely > unlikely" on > physical platforms. I think the pattern is "theoretically possible" at least. > > I suggest that we go ahead with the small patch for the OVMF use case first, > and then open a new BZ if Brian thinks there's a real safety problem with the > code. > We also notice this theoretical problem when we implement this change, but We think this is rarely to happen on physical platforms. We think the value for the change is much bigger than not do this change because of this theoretical problem. Thanks, Eric > ... I should note that, with the small patch that's going to fix the OVMF use > case, the previous behavior will be restored for *all* platforms that continue > using their previous PCD values. > > The cumulative effect of commit 0594ec417c89 and the upcoming patch will > be that a platform may significantly decrease > "PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the > "NumApsExecuting" loop will take the role of the preexisting > TimedWaitForApFinish() call. If a platform doesn't want that, then it should > keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as before, and then > any implementation details in the "NumApsExecuting" loop will be irrelevant. > > Thanks! > Laszlo > ___ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Hi Eric, On 10/25/17 07:42, Dong, Eric wrote: > Hi Laszlo, > > I think I have clear about your raised issues for Ovmf platform. In this > case, I think your platform not suit for this code change. How about I do > below change based on the new code: > > - if (CpuMpData->CpuCount == 0) { > TimedWaitForApFinish ( > CpuMpData, > PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, > PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) > ); > - } > > After this change, for Ovmf can still set > PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old solution. > For the real platform, it can set a small value to follow the new > solution. For this new change, it just wait one more > PcdCpuApInitTimeOutInMicroSeconds timeout, should not have > performance impact (This time may also been cost in later while > (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have > litter performance impact (not cover by the later loop). The suggested change will work for OVMF, thanks! (Just please don't forget to un-indent the TimedWaitForApFinish() call that will become unconditional again.) --o-- Do you think Brian's concern should be investigated as well (perhaps in a separate Bugzilla entry)? Because, I believe, the scheduling pattern that I described earlier could be possible on physical platforms as well, at least in theory: >> (2) After at least one AP has started running, the logic expects >> "NumApsExecuting" to monotonically grow for a while, and then to >> monotonically decrease, back to zero. For example, if we have 1 BSP and >> 7 APs, the BSP logic more or less expects the following values in >> "NumApsExecuting": >> >> 1; 2; 3; 4; 5; 6; 7; >> 6; 5; 4; 3; 2; 1; 0 >> >> >> While this may be a valid expectation for physical processors (which actually >> run in parallel, in the physical world), in a virtual machine, it is not >> guaranteed. >> Dependent on hypervisor scheduling artifacts, it is possible that, say, three >> APs start up *and finish* before the remaining four APs start up *at all*. >> The >> INIT-SIPI-SIPI marks all APs for execution / scheduling in the hypervisor, >> yes, >> but the actual code execution may commence a lot later. For example, the >> BSP may witness the following series of values in "NumApsExecuting": >> >> 1; 2; 3; >> 2; 1; 0; >> 1; 2; 3; 4; >> 3; 2; 1; 0 >> >> and the BSP could think that there are 3 APs only, when it sees the first 0 >> value. I don't know if such a pattern is "likely", "unlikely", or "extremely unlikely" on physical platforms. I think the pattern is "theoretically possible" at least. I suggest that we go ahead with the small patch for the OVMF use case first, and then open a new BZ if Brian thinks there's a real safety problem with the code. ... I should note that, with the small patch that's going to fix the OVMF use case, the previous behavior will be restored for *all* platforms that continue using their previous PCD values. The cumulative effect of commit 0594ec417c89 and the upcoming patch will be that a platform may significantly decrease "PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the "NumApsExecuting" loop will take the role of the preexisting TimedWaitForApFinish() call. If a platform doesn't want that, then it should keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as before, and then any implementation details in the "NumApsExecuting" loop will be irrelevant. Thanks! Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Hi Laszlo, I think I have clear about your raised issues for Ovmf platform. In this case, I think your platform not suit for this code change. How about I do below change based on the new code: - if (CpuMpData->CpuCount == 0) { TimedWaitForApFinish ( CpuMpData, PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) ); - } After this change, for Ovmf can still set PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old solution. For the real platform, it can set a small value to follow the new solution. For this new change, it just wait one more PcdCpuApInitTimeOutInMicroSeconds timeout, should not have performance impact (This time may also been cost in later while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have litter performance impact (not cover by the later loop). Thanks, Eric > -Original Message- > From: Laszlo Ersek [mailto:ler...@redhat.com] > Sent: Wednesday, October 25, 2017 1:40 AM > To: Dong, Eric ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu ; Paolo Bonzini ; > Jeff Fan > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for > AP initialization logic. > > Hi Eric, > > On 10/24/17 17:23, Dong, Eric wrote: > > Laszlo, > > > >> -Original Message- > >> From: Laszlo Ersek [mailto:ler...@redhat.com] > >> Sent: Tuesday, October 24, 2017 6:16 PM > >> To: Dong, Eric ; edk2-devel@lists.01.org > >> Cc: Ni, Ruiyu ; Paolo Bonzini > >> > >> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting > >> for AP initialization logic. > >> > >> CC Paolo > >> > >> On 10/23/17 09:22, Eric Dong wrote: > > >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > >>> index 976af1f..bdfe0d3 100644 > >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > >>> @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equ > LockLocation > >> + 30h > >>> Cr3Location equLockLocation + 34h > >>> InitFlagLocation equLockLocation + 38h > >>> CpuInfoLocation equLockLocation + 3Ch > >>> +NumApsExecutingLocation equLockLocation + 40h > >>> > >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > >>> index 1b9c6a6..2b6c27d 100644 > >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > >>> @@ -86,6 +86,12 @@ Flat32Start: ; > >>> protected mode > >> entry point > >>> > >>> movesi, ebx > >>> > >>> +; Increment the number of APs executing here as early as possible > >>> +; This is decremented in C code when AP is finished executing > >>> +movedi, esi > >>> +addedi, NumApsExecutingLocation > >>> +lock inc dword [edi] > >>> + > >>> mov edi, esi > >>> add edi, EnableExecuteDisableLocation > >>> cmp byte [edi], 0 > >>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > >>> b/UefiCpuPkg/Library/MpInitLib/MpLib.c > >>> index db923c9..48f930b 100644 > >>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > >>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > >>> @@ -662,6 +662,7 @@ ApWakeupFunction ( > >>> // AP finished executing C code > >>> // > >>> InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); > >>> +InterlockedDecrement ((UINT32 *) > >>> + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); > >>> > >>> // > >>> // Place AP is specified loop mode @@ -765,6 +766,7 @@ > >>> FillExchangeInfoData ( > >>> > >>>ExchangeInfo->CFunction = (UINTN) ApWakeupFunction; > >>>ExchangeInfo->ApIndex = 0; > >>> + ExchangeInfo->NumApsExecuting = 0; > >>>ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag; > >>>ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) > >> CpuMpData->CpuInfoInHob; > >>>ExchangeIn
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
On 10/24/2017 12:40 PM, Laszlo Ersek wrote: Hi Eric, On 10/24/17 17:23, Dong, Eric wrote: Laszlo, -Original Message- From: Laszlo Ersek [mailto:ler...@redhat.com] Sent: Tuesday, October 24, 2017 6:16 PM To: Dong, Eric ; edk2-devel@lists.01.org Cc: Ni, Ruiyu ; Paolo Bonzini Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic. CC Paolo On 10/23/17 09:22, Eric Dong wrote: diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc index 976af1f..bdfe0d3 100644 --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equLockLocation + 30h Cr3Location equLockLocation + 34h InitFlagLocation equLockLocation + 38h CpuInfoLocation equLockLocation + 3Ch +NumApsExecutingLocation equLockLocation + 40h diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm index 1b9c6a6..2b6c27d 100644 --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm @@ -86,6 +86,12 @@ Flat32Start: ; protected mode entry point movesi, ebx +; Increment the number of APs executing here as early as possible +; This is decremented in C code when AP is finished executing +movedi, esi +addedi, NumApsExecutingLocation +lock inc dword [edi] + mov edi, esi add edi, EnableExecuteDisableLocation cmp byte [edi], 0 diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c index db923c9..48f930b 100644 --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c @@ -662,6 +662,7 @@ ApWakeupFunction ( // AP finished executing C code // InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); +InterlockedDecrement ((UINT32 *) + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); // // Place AP is specified loop mode @@ -765,6 +766,7 @@ FillExchangeInfoData ( ExchangeInfo->CFunction = (UINTN) ApWakeupFunction; ExchangeInfo->ApIndex = 0; + ExchangeInfo->NumApsExecuting = 0; ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag; ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) CpuMpData->CpuInfoInHob; ExchangeInfo->CpuMpData = CpuMpData; @@ -934,13 +936,19 @@ WakeUpAP ( } if (CpuMpData->InitFlag == ApInitConfig) { // - // Wait for all potential APs waken up in one specified period + // Wait for one potential AP waken up in one specified period // - TimedWaitForApFinish ( -CpuMpData, -PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, -PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) -); + if (CpuMpData->CpuCount == 0) { +TimedWaitForApFinish ( + CpuMpData, + PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, + PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) + ); + } I don't understand this change. The new comment says, Wait for *one* potential AP waken up in one specified period However, the second parameter of TimedWaitForApFinish(), namely "FinishedApLimit", gets the same value as before: PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1 It means that all of the (possible) APs are waited-for, just the same as before. [[Eric]] This patch changes the collect AP count logic, original solution always waits for a specific time to let all APs start up. If the input CPU number (PcdGet32(PcdCpuMaxLogicalProcessorNumber) - 1) have been found or after a specific time(PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)). BSP will not wait anymore and use CpuMpData->CpuCount as the found AP count. New logic also wait for a specific time, but this time is smaller than the original one. It just wait for the first AP(any AP) begin to do the initialization( do CpuMpData->MpCpuExchangeInfo->NumApsExecuting++ means it begin to do the initialization). When Ap finishes initialization, it will do CpuMpData->MpCpuExchangeInfo->NumApsExecuting--. So after BSP waits for a specific time at first, it just needs to check whether CpuMpData->MpCpuExchangeInfo->NumApsExecuting == 0 to know whether all Aps have finished initialization. Here we still use the original PCD (PcdCpuApInitTimeOutInMicroSeconds) for the new time value. When one AP do the initialization, it will also do CpuMpData->CpuCount++. So here I check whether CpuMpData->CpuCount != 0 to know whether APs already begin to do the initialization. If yes, I not need to do the time out waiting anymore, just check the CpuMpData->MpCpuExchange
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Hi Eric, On 10/24/17 17:23, Dong, Eric wrote: > Laszlo, > >> -Original Message- >> From: Laszlo Ersek [mailto:ler...@redhat.com] >> Sent: Tuesday, October 24, 2017 6:16 PM >> To: Dong, Eric ; edk2-devel@lists.01.org >> Cc: Ni, Ruiyu ; Paolo Bonzini >> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for >> AP initialization logic. >> >> CC Paolo >> >> On 10/23/17 09:22, Eric Dong wrote: >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> index 976af1f..bdfe0d3 100644 >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equLockLocation >> + 30h >>> Cr3Location equLockLocation + 34h >>> InitFlagLocation equLockLocation + 38h >>> CpuInfoLocation equLockLocation + 3Ch >>> +NumApsExecutingLocation equLockLocation + 40h >>> >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> index 1b9c6a6..2b6c27d 100644 >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> @@ -86,6 +86,12 @@ Flat32Start: ; >>> protected mode >> entry point >>> >>> movesi, ebx >>> >>> +; Increment the number of APs executing here as early as possible >>> +; This is decremented in C code when AP is finished executing >>> +movedi, esi >>> +addedi, NumApsExecutingLocation >>> +lock inc dword [edi] >>> + >>> mov edi, esi >>> add edi, EnableExecuteDisableLocation >>> cmp byte [edi], 0 >>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> b/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> index db923c9..48f930b 100644 >>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> @@ -662,6 +662,7 @@ ApWakeupFunction ( >>> // AP finished executing C code >>> // >>> InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); >>> +InterlockedDecrement ((UINT32 *) >>> + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); >>> >>> // >>> // Place AP is specified loop mode @@ -765,6 +766,7 @@ >>> FillExchangeInfoData ( >>> >>>ExchangeInfo->CFunction = (UINTN) ApWakeupFunction; >>>ExchangeInfo->ApIndex = 0; >>> + ExchangeInfo->NumApsExecuting = 0; >>>ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag; >>>ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) >> CpuMpData->CpuInfoInHob; >>>ExchangeInfo->CpuMpData = CpuMpData; >>> @@ -934,13 +936,19 @@ WakeUpAP ( >>> } >>> if (CpuMpData->InitFlag == ApInitConfig) { >>>// >>> - // Wait for all potential APs waken up in one specified period >>> + // Wait for one potential AP waken up in one specified period >>>// >>> - TimedWaitForApFinish ( >>> -CpuMpData, >>> -PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, >>> -PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) >>> -); >>> + if (CpuMpData->CpuCount == 0) { >>> +TimedWaitForApFinish ( >>> + CpuMpData, >>> + PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, >>> + PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) >>> + ); >>> + } >> >> I don't understand this change. The new comment says, >> >> Wait for *one* potential AP waken up in one specified period >> >> However, the second parameter of TimedWaitForApFinish(), namely >> "FinishedApLimit", gets the same value as before: >> >> PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1 >> >> It means that all of the (possible) APs are waited-for, just the same >> as before. > > [[Eric]] This patch changes the collect AP count logic, original > solution always waits for a specific time to let all APs start up. If > the input CPU number (PcdGet32(PcdCpuMaxLogicalProcessorNumber) - 1) > ha
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Laszlo, Add more comments for TimedWaitForApFinish function in mail. > -Original Message- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Dong, Eric > Sent: Tuesday, October 24, 2017 11:24 PM > To: Laszlo Ersek ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu ; Paolo Bonzini > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for > AP initialization logic. > > Laszlo, > > > -Original Message- > > From: Laszlo Ersek [mailto:ler...@redhat.com] > > Sent: Tuesday, October 24, 2017 6:16 PM > > To: Dong, Eric ; edk2-devel@lists.01.org > > Cc: Ni, Ruiyu ; Paolo Bonzini > > > > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting > > for AP initialization logic. > > > > CC Paolo > > > > On 10/23/17 09:22, Eric Dong wrote: > > > Current logic always waiting for a specific value to collect all APs > > > count. This logic may caused some platforms cost too much time to > > > wait for time out. > > > This patch add new logic to collect APs count. It adds new variable > > > NumApsExecuting to detect whether all APs have finished initialization. > > > Each AP let NumApsExecuting++ when begin to initialize itself and > > > let > > > NumApsExecuting-- when it finish the initialization. BSP base on > > > whether NumApsExecuting == 0 to finished the collect AP process. > > > > > > Cc: Ruiyu Ni > > > Contributed-under: TianoCore Contribution Agreement 1.1 > > > Signed-off-by: Eric Dong > > > --- > > > UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc| 1 + > > > UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 6 ++ > > > UefiCpuPkg/Library/MpInitLib/MpLib.c | 20 ++-- > > > UefiCpuPkg/Library/MpInitLib/MpLib.h | 1 + > > > UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc | 3 ++- > > > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 6 ++ > > > 6 files changed, 30 insertions(+), 7 deletions(-) > > > > I was out of office yesterday, and did not get a chance to comment on > > this patch. > > > > In a virtualization guest, I see the following problem with the patch: > > > > VCPUs (virtual CPUs) may not be scheduled by the hypervisor similarly > > to how physical CPUs behave on a physical board. It is possible that a > > VCPU starts up and even finishes its initialization routine before > > another VCPU starts running at all. > > > > Therefore the locked NumApsExecuting counter may hit zero, even > > multiple times, before all APs have finished initializing. > > > > In OVMF, we query QEMU about the exact number of virtual processors, > > in PlatformPei. So OVMF configures the logical processor count in > > advance that MpInitLib has to wait for. Correspondingly, we also set > > the timeout to "infinity". > > > > Please see the MaxCpuCountInitialization() function in following commit: > > > > https://github.com/tianocore/edk2/commit/45a70db3c3a59 > > > > In the past, we used to have AP initialization problems in OVMF due to > > the VCPU scheduling artifacts I mention above. After commit > > 45a70db3c3a59, things have been stable; it would be nice to keep that > working. > > > > Please note that simply testing this patch on my end is not sufficient. > > The AP init problems we used to face were sporadic and also specific > > to the virtualization host systems (i.e., dependent on the physical > > hardware and the host kernel). > > > > Furthermore: > > [[Eric]] Will comments later if needed, need some investigation first. > > > > > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > > index 976af1f..bdfe0d3 100644 > > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equ > LockLocation > > + 30h > > > Cr3Location equLockLocation + 34h > > > InitFlagLocation equLockLocation + 38h > > > CpuInfoLocation equLockLocation + 3Ch > > > +NumApsExecutingLocation equLockLocation + 40h > > > > > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > > index 1b9c6a6..2b6c27d 100644 > > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.na
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Laszlo, > -Original Message- > From: Laszlo Ersek [mailto:ler...@redhat.com] > Sent: Tuesday, October 24, 2017 6:16 PM > To: Dong, Eric ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu ; Paolo Bonzini > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for > AP initialization logic. > > CC Paolo > > On 10/23/17 09:22, Eric Dong wrote: > > Current logic always waiting for a specific value to collect all APs > > count. This logic may caused some platforms cost too much time to wait > > for time out. > > This patch add new logic to collect APs count. It adds new variable > > NumApsExecuting to detect whether all APs have finished initialization. > > Each AP let NumApsExecuting++ when begin to initialize itself and let > > NumApsExecuting-- when it finish the initialization. BSP base on > > whether NumApsExecuting == 0 to finished the collect AP process. > > > > Cc: Ruiyu Ni > > Contributed-under: TianoCore Contribution Agreement 1.1 > > Signed-off-by: Eric Dong > > --- > > UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc| 1 + > > UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 6 ++ > > UefiCpuPkg/Library/MpInitLib/MpLib.c | 20 ++-- > > UefiCpuPkg/Library/MpInitLib/MpLib.h | 1 + > > UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc | 3 ++- > > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 6 ++ > > 6 files changed, 30 insertions(+), 7 deletions(-) > > I was out of office yesterday, and did not get a chance to comment on this > patch. > > In a virtualization guest, I see the following problem with the patch: > > VCPUs (virtual CPUs) may not be scheduled by the hypervisor similarly to how > physical CPUs behave on a physical board. It is possible that a VCPU starts up > and even finishes its initialization routine before another VCPU starts > running > at all. > > Therefore the locked NumApsExecuting counter may hit zero, even multiple > times, before all APs have finished initializing. > > In OVMF, we query QEMU about the exact number of virtual processors, in > PlatformPei. So OVMF configures the logical processor count in advance that > MpInitLib has to wait for. Correspondingly, we also set the timeout to > "infinity". > > Please see the MaxCpuCountInitialization() function in following commit: > > https://github.com/tianocore/edk2/commit/45a70db3c3a59 > > In the past, we used to have AP initialization problems in OVMF due to the > VCPU scheduling artifacts I mention above. After commit 45a70db3c3a59, > things have been stable; it would be nice to keep that working. > > Please note that simply testing this patch on my end is not sufficient. > The AP init problems we used to face were sporadic and also specific to the > virtualization host systems (i.e., dependent on the physical hardware and the > host kernel). > > Furthermore: [[Eric]] Will comments later if needed, need some investigation first. > > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > index 976af1f..bdfe0d3 100644 > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equLockLocation > + 30h > > Cr3Location equLockLocation + 34h > > InitFlagLocation equLockLocation + 38h > > CpuInfoLocation equLockLocation + 3Ch > > +NumApsExecutingLocation equLockLocation + 40h > > > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > index 1b9c6a6..2b6c27d 100644 > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > @@ -86,6 +86,12 @@ Flat32Start: ; > > protected mode > entry point > > > > movesi, ebx > > > > +; Increment the number of APs executing here as early as possible > > +; This is decremented in C code when AP is finished executing > > +movedi, esi > > +addedi, NumApsExecutingLocation > > +lock inc dword [edi] > > + > > mov edi, esi > > add edi, EnableExecuteDisableLocation > > cmp byte [edi], 0 > > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > > b/UefiCpuPkg/Library/MpInitLib/MpLib.c > > index db923c9..48f930b 100644 > > --- a/UefiCpuPkg/Library/MpInit
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
CC Paolo On 10/23/17 09:22, Eric Dong wrote: > Current logic always waiting for a specific value to collect all APs > count. This logic may caused some platforms cost too much time to > wait for time out. > This patch add new logic to collect APs count. It adds new variable > NumApsExecuting to detect whether all APs have finished initialization. > Each AP let NumApsExecuting++ when begin to initialize itself and let > NumApsExecuting-- when it finish the initialization. BSP base on whether > NumApsExecuting == 0 to finished the collect AP process. > > Cc: Ruiyu Ni > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Eric Dong > --- > UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc| 1 + > UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 6 ++ > UefiCpuPkg/Library/MpInitLib/MpLib.c | 20 ++-- > UefiCpuPkg/Library/MpInitLib/MpLib.h | 1 + > UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc | 3 ++- > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 6 ++ > 6 files changed, 30 insertions(+), 7 deletions(-) I was out of office yesterday, and did not get a chance to comment on this patch. In a virtualization guest, I see the following problem with the patch: VCPUs (virtual CPUs) may not be scheduled by the hypervisor similarly to how physical CPUs behave on a physical board. It is possible that a VCPU starts up and even finishes its initialization routine before another VCPU starts running at all. Therefore the locked NumApsExecuting counter may hit zero, even multiple times, before all APs have finished initializing. In OVMF, we query QEMU about the exact number of virtual processors, in PlatformPei. So OVMF configures the logical processor count in advance that MpInitLib has to wait for. Correspondingly, we also set the timeout to "infinity". Please see the MaxCpuCountInitialization() function in following commit: https://github.com/tianocore/edk2/commit/45a70db3c3a59 In the past, we used to have AP initialization problems in OVMF due to the VCPU scheduling artifacts I mention above. After commit 45a70db3c3a59, things have been stable; it would be nice to keep that working. Please note that simply testing this patch on my end is not sufficient. The AP init problems we used to face were sporadic and also specific to the virtualization host systems (i.e., dependent on the physical hardware and the host kernel). Furthermore: > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > index 976af1f..bdfe0d3 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equLockLocation + 30h > Cr3Location equLockLocation + 34h > InitFlagLocation equLockLocation + 38h > CpuInfoLocation equLockLocation + 3Ch > +NumApsExecutingLocation equLockLocation + 40h > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > index 1b9c6a6..2b6c27d 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > @@ -86,6 +86,12 @@ Flat32Start: ; protected > mode entry point > > movesi, ebx > > +; Increment the number of APs executing here as early as possible > +; This is decremented in C code when AP is finished executing > +movedi, esi > +addedi, NumApsExecutingLocation > +lock inc dword [edi] > + > mov edi, esi > add edi, EnableExecuteDisableLocation > cmp byte [edi], 0 > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > b/UefiCpuPkg/Library/MpInitLib/MpLib.c > index db923c9..48f930b 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -662,6 +662,7 @@ ApWakeupFunction ( > // AP finished executing C code > // > InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); > +InterlockedDecrement ((UINT32 *) > &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); > > // > // Place AP is specified loop mode > @@ -765,6 +766,7 @@ FillExchangeInfoData ( > >ExchangeInfo->CFunction = (UINTN) ApWakeupFunction; >ExchangeInfo->ApIndex = 0; > + ExchangeInfo->NumApsExecuting = 0; >ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag; >ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) > CpuMpData->CpuInfoInHob; >ExchangeInfo->CpuMpData = CpuMpData; > @@ -934,13 +936,19 @@ WakeUpAP ( > } > if (CpuMpData->InitFlag == ApInitConfig) { >// > - // Wait for all potential APs waken up in one specified period > + // Wait for one potential AP waken up in one specified period >// > - TimedWaitForApFinish ( >
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Jeff, I see. Thanks for mentioning that😊 Reviewed-by: Ruiyu Ni Do you mind to also give a r-b? Thanks/Ray > -Original Message- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Fan Jeff > Sent: Tuesday, October 24, 2017 2:27 PM > To: Ni, Ruiyu ; Dong, Eric ; edk2- > de...@lists.01.org > Subject: [edk2] 答复: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for > AP initialization logic. > > Ray, > > MpCpuExchangeInfo was declared as volatile type. It may not be necessary > to add volatile for NumApsExecuting. > > Jeff > > 发件人: Ni, Ruiyu<mailto:ruiyu...@intel.com> > 发送时间: 2017年10月24日 14:03 > 收件人: Dong, Eric<mailto:eric.d...@intel.com>; edk2- > de...@lists.01.org<mailto:edk2-devel@lists.01.org> > 主题: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP > initialization logic. > > You need to have "volatile" for "UINTN NumApsExecuting;". > Otherwise, compiler may optimize the code to cause below code wait > forever: > while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) { > CpuPause(); > } > > > Thanks/Ray > > > -Original Message- > > From: Dong, Eric > > Sent: Monday, October 23, 2017 3:23 PM > > To: edk2-devel@lists.01.org > > Cc: Ni, Ruiyu > > Subject: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP > > initialization logic. > > > > Current logic always waiting for a specific value to collect all APs > > count. This logic may caused some platforms cost too much time to wait for > time out. > > This patch add new logic to collect APs count. It adds new variable > > NumApsExecuting to detect whether all APs have finished initialization. > > Each AP let NumApsExecuting++ when begin to initialize itself and let > > NumApsExecuting-- when it finish the initialization. BSP base on > > whether NumApsExecuting == 0 to finished the collect AP process. > > > > Cc: Ruiyu Ni > > Contributed-under: TianoCore Contribution Agreement 1.1 > > Signed-off-by: Eric Dong > > --- > > UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc| 1 + > > UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 6 ++ > > UefiCpuPkg/Library/MpInitLib/MpLib.c | 20 ++-- > > UefiCpuPkg/Library/MpInitLib/MpLib.h | 1 + > > UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc | 3 ++- > > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 6 ++ > > 6 files changed, 30 insertions(+), 7 deletions(-) > > > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > index 976af1f..bdfe0d3 100644 > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equLockLocation + > > 30h > > Cr3Location equLockLocation + 34h > > InitFlagLocation equLockLocation + 38h > > CpuInfoLocation equLockLocation + 3Ch > > +NumApsExecutingLocation equLockLocation + 40h > > > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > index 1b9c6a6..2b6c27d 100644 > > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > > @@ -86,6 +86,12 @@ Flat32Start: ; > > protected mode entry > > point > > > > movesi, ebx > > > > +; Increment the number of APs executing here as early as possible > > +; This is decremented in C code when AP is finished executing > > +movedi, esi > > +addedi, NumApsExecutingLocation > > +lock inc dword [edi] > > + > > mov edi, esi > > add edi, EnableExecuteDisableLocation > > cmp byte [edi], 0 > > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > > b/UefiCpuPkg/Library/MpInitLib/MpLib.c > > index db923c9..48f930b 100644 > > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > > @@ -662,6 +662,7 @@ ApWakeupFunction ( > > // AP finished executing C code > > // > > InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); > > +InterlockedDecrement ((UINT32 *) > > + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); > > > > // > > // Place AP is specified loop mo
Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
You need to have "volatile" for "UINTN NumApsExecuting;". Otherwise, compiler may optimize the code to cause below code wait forever: while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) { CpuPause(); } Thanks/Ray > -Original Message- > From: Dong, Eric > Sent: Monday, October 23, 2017 3:23 PM > To: edk2-devel@lists.01.org > Cc: Ni, Ruiyu > Subject: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP > initialization logic. > > Current logic always waiting for a specific value to collect all APs count. > This > logic may caused some platforms cost too much time to wait for time out. > This patch add new logic to collect APs count. It adds new variable > NumApsExecuting to detect whether all APs have finished initialization. > Each AP let NumApsExecuting++ when begin to initialize itself and let > NumApsExecuting-- when it finish the initialization. BSP base on whether > NumApsExecuting == 0 to finished the collect AP process. > > Cc: Ruiyu Ni > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Eric Dong > --- > UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc| 1 + > UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 6 ++ > UefiCpuPkg/Library/MpInitLib/MpLib.c | 20 ++-- > UefiCpuPkg/Library/MpInitLib/MpLib.h | 1 + > UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc | 3 ++- > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 6 ++ > 6 files changed, 30 insertions(+), 7 deletions(-) > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > index 976af1f..bdfe0d3 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equLockLocation + > 30h > Cr3Location equLockLocation + 34h > InitFlagLocation equLockLocation + 38h > CpuInfoLocation equLockLocation + 3Ch > +NumApsExecutingLocation equLockLocation + 40h > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > index 1b9c6a6..2b6c27d 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > @@ -86,6 +86,12 @@ Flat32Start: ; protected > mode entry > point > > movesi, ebx > > +; Increment the number of APs executing here as early as possible > +; This is decremented in C code when AP is finished executing > +movedi, esi > +addedi, NumApsExecutingLocation > +lock inc dword [edi] > + > mov edi, esi > add edi, EnableExecuteDisableLocation > cmp byte [edi], 0 > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > b/UefiCpuPkg/Library/MpInitLib/MpLib.c > index db923c9..48f930b 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -662,6 +662,7 @@ ApWakeupFunction ( > // AP finished executing C code > // > InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); > +InterlockedDecrement ((UINT32 *) > + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); > > // > // Place AP is specified loop mode > @@ -765,6 +766,7 @@ FillExchangeInfoData ( > >ExchangeInfo->CFunction = (UINTN) ApWakeupFunction; >ExchangeInfo->ApIndex = 0; > + ExchangeInfo->NumApsExecuting = 0; >ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag; >ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) CpuMpData- > >CpuInfoInHob; >ExchangeInfo->CpuMpData = CpuMpData; > @@ -934,13 +936,19 @@ WakeUpAP ( > } > if (CpuMpData->InitFlag == ApInitConfig) { >// > - // Wait for all potential APs waken up in one specified period > + // Wait for one potential AP waken up in one specified period >// > - TimedWaitForApFinish ( > -CpuMpData, > -PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, > -PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) > -); > + if (CpuMpData->CpuCount == 0) { > +TimedWaitForApFinish ( > + CpuMpData, > + PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, > + PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) > + ); > + } > + > + while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) { > +CpuPause(); > + } > } else { >// >// Wait all APs waken up if this is not the 1st broadcast of SIPI diff > --git > a/UefiCpuPkg/Library/MpInitLib/MpLib.h > b/UefiCpuPkg/Library/MpInitLib/MpLib.h > index e41d2db..d13d5c0 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h > @@ -176,6 +176,7 @@ typedef struct { >UINTN Cr3; >UINTN Ini
[edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Current logic always waiting for a specific value to collect all APs count. This logic may caused some platforms cost too much time to wait for time out. This patch add new logic to collect APs count. It adds new variable NumApsExecuting to detect whether all APs have finished initialization. Each AP let NumApsExecuting++ when begin to initialize itself and let NumApsExecuting-- when it finish the initialization. BSP base on whether NumApsExecuting == 0 to finished the collect AP process. Cc: Ruiyu Ni Contributed-under: TianoCore Contribution Agreement 1.1 Signed-off-by: Eric Dong --- UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc| 1 + UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 6 ++ UefiCpuPkg/Library/MpInitLib/MpLib.c | 20 ++-- UefiCpuPkg/Library/MpInitLib/MpLib.h | 1 + UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc | 3 ++- UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 6 ++ 6 files changed, 30 insertions(+), 7 deletions(-) diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc index 976af1f..bdfe0d3 100644 --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equLockLocation + 30h Cr3Location equLockLocation + 34h InitFlagLocation equLockLocation + 38h CpuInfoLocation equLockLocation + 3Ch +NumApsExecutingLocation equLockLocation + 40h diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm index 1b9c6a6..2b6c27d 100644 --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm @@ -86,6 +86,12 @@ Flat32Start: ; protected mode entry point movesi, ebx +; Increment the number of APs executing here as early as possible +; This is decremented in C code when AP is finished executing +movedi, esi +addedi, NumApsExecutingLocation +lock inc dword [edi] + mov edi, esi add edi, EnableExecuteDisableLocation cmp byte [edi], 0 diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c index db923c9..48f930b 100644 --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c @@ -662,6 +662,7 @@ ApWakeupFunction ( // AP finished executing C code // InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); +InterlockedDecrement ((UINT32 *) &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); // // Place AP is specified loop mode @@ -765,6 +766,7 @@ FillExchangeInfoData ( ExchangeInfo->CFunction = (UINTN) ApWakeupFunction; ExchangeInfo->ApIndex = 0; + ExchangeInfo->NumApsExecuting = 0; ExchangeInfo->InitFlag= (UINTN) CpuMpData->InitFlag; ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) CpuMpData->CpuInfoInHob; ExchangeInfo->CpuMpData = CpuMpData; @@ -934,13 +936,19 @@ WakeUpAP ( } if (CpuMpData->InitFlag == ApInitConfig) { // - // Wait for all potential APs waken up in one specified period + // Wait for one potential AP waken up in one specified period // - TimedWaitForApFinish ( -CpuMpData, -PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, -PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) -); + if (CpuMpData->CpuCount == 0) { +TimedWaitForApFinish ( + CpuMpData, + PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, + PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) + ); + } + + while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) { +CpuPause(); + } } else { // // Wait all APs waken up if this is not the 1st broadcast of SIPI diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h index e41d2db..d13d5c0 100644 --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h @@ -176,6 +176,7 @@ typedef struct { UINTN Cr3; UINTN InitFlag; CPU_INFO_IN_HOB *CpuInfo; + UINTN NumApsExecuting; CPU_MP_DATA *CpuMpData; UINTN InitializeFloatingPointUnitsAddress; } MP_CPU_EXCHANGE_INFO; diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc b/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc index 114f4e0..d255ca5 100644 --- a/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc @@ -40,5 +40,6 @@ EnableExecuteDisableLocation equLockLocation + 5Ch Cr3Location equLockLocation + 64h InitFlagLocation equLockLocation + 6Ch CpuInfoLocation equLockLocatio