Hi Ruiyu,

> -----Original Message-----
> From: Ni, Ruiyu
> Sent: Tuesday, October 16, 2018 11:05 AM
> To: Dong, Eric <eric.d...@intel.com>; edk2-devel@lists.01.org
> Cc: Laszlo Ersek <ler...@redhat.com>
> Subject: Re: [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to
> support semaphore type.
> 
> On 10/15/2018 10:49 AM, Eric Dong wrote:
> > In a system which has multiple cores, current set register value task costs
> huge times.
> > After investigation, current set MSR task costs most of the times. Current
> logic uses
> > SpinLock to let set MSR task as an single thread task for all cores. Because
> MSR has
> > scope attribute which may cause GP fault if multiple APs set MSR at the
> same time,
> > current logic use an easiest solution (use SpinLock) to avoid this issue, 
> > but it
> will
> > cost huge times.
> >
> > In order to fix this performance issue, new solution will set MSRs base on
> their scope
> > attribute. After this, the SpinLock will not needed. Without SpinLock, new
> issue raised
> > which is caused by MSR dependence. For example, MSR A depends on
> MSR B which means MSR A
> > must been set after MSR B has been set. Also MSR B is package scope level
> and MSR A is
> > thread scope level. If system has multiple threads, Thread 1 needs to set
> the thread level
> > MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs
> task for thread 1
> > and thread 2 like below:
> >
> >              Thread 1                 Thread 2
> > MSR B          N                        Y
> > MSR A          Y                        Y
> >
> > If driver don't control execute MSR order, for thread 1, it will execute MSR
> A first, but
> > at this time, MSR B not been executed yet by thread 2. system may trig
> exception at this
> > time.
> >
> > In order to fix the above issue, driver introduces semaphore logic to 
> > control
> the MSR
> > execute sequence. For the above case, a semaphore will be add between
> MSR A and B for
> > all threads. Semaphore has scope info for it. The possible scope value is
> core or package.
> > For each thread, when it meets a semaphore during it set registers, it will 
> > 1)
> release
> > semaphore (+1) for each threads in this core or package(based on the
> scope info for this
> > semaphore) 2) acquire semaphore (-1) for all the threads in this core or
> package(based
> > on the scope info for this semaphore). With these two steps, driver can
> control MSR
> > sequence. Sample code logic like below:
> >
> >    //
> >    // First increase semaphore count by 1 for processors in this package.
> >    //
> >    for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> >      LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> ProcessorIndex]);
> >    }
> >    //
> >    // Second, check whether the count has reach the check number.
> >    //
> >    for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex
> ++) {
> >      LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> >    }
> >
> > Platform Requirement:
> > 1. This change requires register MSR setting base on MSR scope info. If 
> > still
> register MSR
> >     for all threads, exception may raised.
> >
> > Known limitation:
> > 1. Current CpuFeatures driver supports DXE instance and PEI instance. But
> semaphore logic
> >     requires Aps execute in async mode which is not supported by PEI driver.
> So CpuFeature
> >     PEI instance not works after this change. We plan to support async mode
> for PEI in phase
> >     2 for this task.
> >
> > Cc: Ruiyu Ni <ruiyu...@intel.com>
> > Cc: Laszlo Ersek <ler...@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.d...@intel.com>
> > ---
> >   .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++--
> -
> >   .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
> >   .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
> >   .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
> >   .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
> >   .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
> >   .../RegisterCpuFeaturesLib.c                       | 452 
> > ++++++++++++++++++---
> >   7 files changed, 840 insertions(+), 117 deletions(-)
> >
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > index ba3fb3250f..f820b4fed7 100644
> > --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > @@ -145,6 +145,20 @@ CpuInitDataInitialize (
> >     CPU_FEATURES_INIT_ORDER              *InitOrder;
> >     CPU_FEATURES_DATA                    *CpuFeaturesData;
> >     LIST_ENTRY                           *Entry;
> > +  UINT32                               Core;
> > +  UINT32                               Package;
> > +  UINT32                               Thread;
> > +  EFI_CPU_PHYSICAL_LOCATION            *Location;
> > +  UINT32                               *CoreArray;
> > +  UINTN                                Index;
> > +  UINT32                               ValidCount;
> > +  UINTN                                CoreIndex;
> > +  ACPI_CPU_DATA                        *AcpiCpuData;
> > +  CPU_STATUS_INFORMATION               *CpuStatus;
> > +
> > +  Core    = 0;
> > +  Package = 0;
> > +  Thread  = 0;
> >
> >     CpuFeaturesData = GetCpuFeaturesData ();
> >     CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof
> (CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
> > @@ -163,6 +177,16 @@ CpuInitDataInitialize (
> >       Entry = Entry->ForwardLink;
> >     }
> >
> > +  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
> > +
> > +  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64
> (PcdCpuS3DataAddress);
> > +  ASSERT (AcpiCpuData != NULL);
> > +  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
> > +
> > +  CpuStatus = &AcpiCpuData->CpuStatus;
> > +  AcpiCpuData->ApLocation = AllocateZeroPool (sizeof
> (EFI_CPU_PHYSICAL_LOCATION) * NumberOfCpus);
> > +  ASSERT (AcpiCpuData->ApLocation != NULL);
> > +
> >     for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus;
> ProcessorNumber++) {
> >       InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
> >       InitOrder->FeaturesSupportedMask = AllocateZeroPool
> (CpuFeaturesData->BitMaskSize);
> > @@ -175,7 +199,59 @@ CpuInitDataInitialize (
> >         &ProcessorInfoBuffer,
> >         sizeof (EFI_PROCESSOR_INFORMATION)
> >         );
> > +    CopyMem (
> > +      AcpiCpuData->ApLocation + ProcessorNumber,
> > +      &ProcessorInfoBuffer.Location,
> > +      sizeof (EFI_CPU_PHYSICAL_LOCATION)
> > +      );
> > +
> 
> Please add more comments here to describe what the below code tries to
> do and why.

Yes, will add comments in next version change.

> 
> > +    if (Package < ProcessorInfoBuffer.Location.Package) {
> > +      Package = ProcessorInfoBuffer.Location.Package;
> > +    }
> > +    if (Core < ProcessorInfoBuffer.Location.Core) {
> > +      Core = ProcessorInfoBuffer.Location.Core;
> > +    }
> > +    if (Thread < ProcessorInfoBuffer.Location.Thread) {
> > +      Thread = ProcessorInfoBuffer.Location.Thread;
> > +    }
> > +  }
> > +  CpuStatus->PackageCount = Package + 1;
> > +  CpuStatus->CoreCount    = Core + 1;
> > +  CpuStatus->ThreadCount  = Thread + 1;
> 
> 
> > +  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, Core : %d,
> Thread: %d\n",
> > +         CpuStatus->PackageCount,
> > +         CpuStatus->CoreCount,
> > +         CpuStatus->ThreadCount));
> 
> Please use MaxCore and MaxThread in debug message. Otherwise it's
> confusing.

Yes, will update in next version change.

> 
> > +
> > +  //
> > +  // Collect valid core count in each package because not all cores are 
> > valid.
> > +  //
> > +  CpuStatus->ValidCoresInPackages = AllocateZeroPool (sizeof (UINT32) *
> CpuStatus->PackageCount);
> > +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
> 
> Please add comments to describe the purpose of CoreArray.
> CoreArray is not a good name IMO. How about:
> CoreVisited - AllocatePool (sizeof (BOOLEAN) * CpuStatus->MaxCoreCount);
> 
> > +  CoreArray = AllocatePool (sizeof (UINT32) * CpuStatus->CoreCount);
> > +  ASSERT (CoreArray != NULL);
> > +
> > +  for (Index = 0; Index <= Package; Index ++ ) {
> 
> Please stop using Package/Core/Thread. Use the field in CpuStatus
> structure instead. It makes the code more readable.

Ok, will update in next version.

> 
> > +    ZeroMem (CoreArray, sizeof (UINT32) * (Core + 1));
> > +    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus;
> ProcessorNumber++) {
> > +      Location = &CpuFeaturesData-
> >InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
> > +      if (Location->Package == Index) {
> > +        CoreArray[Location->Core] = 1;
> > +      }
> 
> The above if-clause can be:
>           if ((Location->Package == Index) &&
>               !CoreVisited[Location->Core])) {
>             CpuStatus->ValidCoreCountPerPackage[Index]++;
>             CoreVisited[Location->Core] = TRUE;
>           }
> 
> The for-loop below can be removed.

Thanks for the enhancement, will update the code logic in next version.

> 
> > +    }
> > +    for (CoreIndex = 0, ValidCount = 0; CoreIndex <= Core; CoreIndex ++) {
> > +      ValidCount += CoreArray[CoreIndex];
> > +    }
> > +    CpuStatus->ValidCoresInPackages[Index] = ValidCount;
> >     }
> > +  FreePool (CoreArray);
> > +  for (Index = 0; Index <= Package; Index++) {
> > +    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index,
> CpuStatus->ValidCoresInPackages[Index]));
> > +  }
> > +
> > +  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool
> (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount*
> CpuStatus->ThreadCount);
> > +  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
> > +
> >     //
> >     // Get support and configuration PCDs
> >     //
> > @@ -310,7 +386,7 @@ CollectProcessorData (
> >     LIST_ENTRY                           *Entry;
> >     CPU_FEATURES_DATA                    *CpuFeaturesData;
> >
> > -  CpuFeaturesData = GetCpuFeaturesData ();
> > +  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;
> 
> Is the above change more proper in a separate patch?
> 
> >     ProcessorNumber = GetProcessorIndex ();
> >     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
> >     //
> > @@ -416,6 +492,15 @@ DumpRegisterTableOnProcessor (
> >           RegisterTableEntry->Value
> >           ));
> >         break;
> > +    case Semaphore:
> > +      DEBUG ((
> > +        DebugPrintErrorLevel,
> > +        "Processor: %d: Semaphore: Scope Value: %d\r\n",
> 
> How about print the Scope value in string? This makes the debug message
> more meaningful.

Ok, will do it in next version change

> 
> > +        ProcessorNumber,
> > +        RegisterTableEntry->Value
> > +        ));
> > +      break;
> > +
> >       default:
> >         break;
> >       }
> > @@ -441,6 +526,11 @@ AnalysisProcessorFeatures (
> >     REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
> >     LIST_ENTRY                           *Entry;
> >     CPU_FEATURES_DATA                    *CpuFeaturesData;
> > +  LIST_ENTRY                           *NextEntry;
> > +  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
> > +  BOOLEAN                              Success;
> > +  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
> > +  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
> >
> >     CpuFeaturesData = GetCpuFeaturesData ();
> >     CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData-
> >BitMaskSize);
> > @@ -517,8 +607,14 @@ AnalysisProcessorFeatures (
> >       //
> >       CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
> >       Entry = GetFirstNode (&CpuInitOrder->OrderList);
> > +    NextEntry = Entry->ForwardLink;
> >       while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
> >         CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
> > +      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
> > +        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK
> (NextEntry);
> > +      } else {
> > +        NextCpuFeatureInOrder = NULL;
> > +      }
> >         if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask,
> CpuFeaturesData->SettingPcd)) {
> >           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber,
> CpuInfo, CpuFeatureInOrder->ConfigData, TRUE);
> >           if (EFI_ERROR (Status)) {
> > @@ -532,6 +628,8 @@ AnalysisProcessorFeatures (
> >               DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask
> = "));
> >               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
> >             }
> > +        } else {
> > +          Success = TRUE;
> >           }
> >         } else {
> >           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber,
> CpuInfo, CpuFeatureInOrder->ConfigData, FALSE);
> > @@ -542,9 +640,36 @@ AnalysisProcessorFeatures (
> >               DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: 
> > Mask
> = "));
> >               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
> >             }
> > +        } else {
> > +          Success = TRUE;
> >           }
> >         }
> > -      Entry = Entry->ForwardLink;
> > +
> > +      if (Success) {
> > +        //
> > +        // If feature has dependence with the next feature (ONLY care
> core/package dependency).
> > +        // and feature initialize succeed, add sync semaphere here.
> > +        //
> > +        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
> > +        if (NextCpuFeatureInOrder != NULL) {
> > +          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
> > +        } else {
> > +          AfterDep = NoneDepType;
> > +        }
> > +        //
> > +        // Assume only one of the depend is valid.
> > +        //
> > +        ASSERT (!(BeforeDep > ThreadDepType && AfterDep >
> ThreadDepType));
> > +        if (BeforeDep > ThreadDepType) {
> > +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0,
> BeforeDep);
> > +        }
> > +        if (AfterDep > ThreadDepType) {
> > +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0,
> AfterDep);
> > +        }
> > +      }
> > +
> > +      Entry     = Entry->ForwardLink;
> > +      NextEntry = Entry->ForwardLink;
> >       }
> >
> >       //
> > @@ -561,27 +686,79 @@ AnalysisProcessorFeatures (
> >     }
> >   }
> >
> > +/**
> > +  Increment semaphore by 1.
> > +
> > +  @param      Sem            IN:  32-bit unsigned integer
> > +
> > +**/
> > +VOID
> > +LibReleaseSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  InterlockedIncrement (Sem);
> > +}
> > +
> > +/**
> > +  Decrement the semaphore by 1 if it is not zero.
> > +
> > +  Performs an atomic decrement operation for semaphore.
> > +  The compare exchange operation must be performed using
> > +  MP safe mechanisms.
> > +
> > +  @param      Sem            IN:  32-bit unsigned integer
> > +
> > +**/
> > +VOID
> > +LibWaitForSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  UINT32  Value;
> > +
> > +  do {
> > +    Value = *Sem;
> > +  } while (Value == 0);
> > +
> > +  InterlockedDecrement (Sem);
> > +}
> > +
> >   /**
> >     Initialize the CPU registers from a register table.
> >
> > -  @param[in]  ProcessorNumber  The index of the CPU executing this
> function.
> > +  @param[in]  RegisterTable         The register table for this AP.
> > +  @param[in]  ApLocation            AP location info for this ap.
> > +  @param[in]  CpuStatus             CPU status info for this CPU.
> > +  @param[in]  CpuFlags              Flags data structure used when program 
> > the
> register.
> >
> >     @note This service could be called by BSP/APs.
> >   **/
> >   VOID
> > +EFIAPI
> >   ProgramProcessorRegister (
> > -  IN UINTN  ProcessorNumber
> > +  IN CPU_REGISTER_TABLE           *RegisterTable,
> > +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> > +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> > +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
> >     )
> >   {
> > -  CPU_FEATURES_DATA         *CpuFeaturesData;
> > -  CPU_REGISTER_TABLE        *RegisterTable;
> >     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
> >     UINTN                     Index;
> >     UINTN                     Value;
> >     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> > -
> > -  CpuFeaturesData = GetCpuFeaturesData ();
> > -  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
> > +  volatile UINT32           *SemaphorePtr;
> > +  UINT32                    CoreOffset;
> > +  UINT32                    PackageOffset;
> > +  UINT32                    PackageThreadsCount;
> > +  UINT32                    ApOffset;
> > +  UINTN                     ProcessorIndex;
> > +  UINTN                     ApIndex;
> > +  UINTN                     ValidApCount;
> > +
> > +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus-
> >ThreadCount \
> > +            + ApLocation->Core * CpuStatus->ThreadCount \
> > +            + ApLocation->Thread;
> >
> >     //
> >     // Traverse Register Table of this logical processor
> > @@ -591,6 +768,7 @@ ProgramProcessorRegister (
> >     for (Index = 0; Index < RegisterTable->TableLength; Index++) {
> >
> >       RegisterTableEntry = &RegisterTableEntryHead[Index];
> > +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type
> = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
> How about print the register type in string?

Yes, will do it in the next version.

> 
> >
> >       //
> >       // Check the type of specified register
> > @@ -654,10 +832,6 @@ ProgramProcessorRegister (
> >       // The specified register is Model Specific Register
> >       //
> >       case Msr:
> > -      //
> > -      // Get lock to avoid Package/Core scope MSRs programming issue in
> parallel execution mode
> > -      //
> > -      AcquireSpinLock (&CpuFeaturesData->MsrLock);
> >         if (RegisterTableEntry->ValidBitLength >= 64) {
> >           //
> >           // If length is not less than 64 bits, then directly write 
> > without reading
> > @@ -677,20 +851,19 @@ ProgramProcessorRegister (
> >             RegisterTableEntry->Value
> >             );
> >         }
> > -      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
> >         break;
> >       //
> >       // MemoryMapped operations
> >       //
> >       case MemoryMapped:
> > -      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
> > +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
> >         MmioBitFieldWrite32 (
> >           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry-
> >HighIndex, 32)),
> >           RegisterTableEntry->ValidBitStart,
> >           RegisterTableEntry->ValidBitStart + RegisterTableEntry-
> >ValidBitLength - 1,
> >           (UINT32)RegisterTableEntry->Value
> >           );
> > -      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
> > +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
> >         break;
> >       //
> >       // Enable or disable cache
> > @@ -706,6 +879,50 @@ ProgramProcessorRegister (
> >         }
> >         break;
> >
> > +    case Semaphore:
> > +      SemaphorePtr = CpuFlags->SemaphoreCount;
> > +      switch (RegisterTableEntry->Value) {
> > +      case CoreDepType:
> > +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount +
> ApLocation->Core) * CpuStatus->ThreadCount > +        ApOffset = CoreOffset
> + ApLocation->Thread;
> 
> How about FirstThread and CurrentThread?

Ok, will use this new names.

> 
> > +        //
> > +        // First increase semaphore count by 1 for processors in this core.
> This comment might not be helpful for reviewer to understand.
> How about "Notify all threads in current Core"?
> 
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset +
> ProcessorIndex]);
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> How about "Wait for all threads in current Core"?

Ok, will do this change in next version changes.

> 
> Below diagram is also helpful
> //
> //  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
> //  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
> //
> //  All threads (T0...Tn) waits in P() line and continues running
> //  together.
> //
> //
> //  T0             T1            ...           Tn
> //
> //  V(0...n)       V(0...n)      ...           V(0...n)
> //  n * P(0)       n * P(1)      ...           n * P(n)
> //
> 

Ok, will do this change in next version changes.

> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      case PackageDepType:
> > +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount *
> CpuStatus->ThreadCount;
> 
> FirstThread?

Ok, will do this change in next version changes.

> 
> > +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus-
> >CoreCount;
> ThreadCount?
> 
> > +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation-
> >Core + ApLocation->Thread;
> CurrentThread?

Ok, will do this change in next version changes.

> 
> > +        ValidApCount = CpuStatus->ThreadCount * CpuStatus-
> >ValidCoresInPackages[ApLocation->Package];
> ValidThreadCount?

Ok, will do this change in next version changes.

> 
> > +        //
> > +        // First increase semaphore count by 1 for processors in this 
> > package.
> How about "Notify all threads in current Package"?

Ok, will do this change in next version changes.

> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> > +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> ProcessorIndex]);
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> How about "Wait for all threads in current Package"?

Ok, will do this change in next version changes.

> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount;
> ProcessorIndex ++) {
> > +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      default:
> > +        break;
> > +      }
> > +      break;
> > +
> >       default:
> >         break;
> >       }
> > @@ -724,10 +941,36 @@ SetProcessorRegister (
> >     IN OUT VOID            *Buffer
> >     )
> >   {
> > -  UINTN                  ProcessorNumber;
> > +  CPU_FEATURES_DATA         *CpuFeaturesData;
> > +  CPU_REGISTER_TABLE        *RegisterTable;
> > +  CPU_REGISTER_TABLE        *RegisterTables;
> > +  UINT32                    InitApicId;
> > +  UINTN                     ProcIndex;
> > +  UINTN                     Index;
> > +  ACPI_CPU_DATA             *AcpiCpuData;
> >
> > -  ProcessorNumber = GetProcessorIndex ();
> > -  ProgramProcessorRegister (ProcessorNumber);
> > +  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
> > +  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
> > +
> > +  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData-
> >RegisterTable;
> > +
> > +  InitApicId = GetInitialApicId ();
> > +  RegisterTable = NULL;
> > +  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
> > +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> > +      RegisterTable =  &RegisterTables[Index];
> > +      ProcIndex = Index;
> > +      break;
> > +    }
> > +  }
> > +  ASSERT (RegisterTable != NULL);
> > +
> > +  ProgramProcessorRegister (
> > +    RegisterTable,
> > +    AcpiCpuData->ApLocation + ProcIndex,
> > +    &AcpiCpuData->CpuStatus,
> > +    &CpuFeaturesData->CpuFlags
> > +    );
> >   }
> >
> >   /**
> > @@ -746,6 +989,9 @@ CpuFeaturesDetect (
> >   {
> >     UINTN                  NumberOfCpus;
> >     UINTN                  NumberOfEnabledProcessors;
> > +  CPU_FEATURES_DATA      *CpuFeaturesData;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData();
> >
> >     GetNumberOfProcessor (&NumberOfCpus,
> &NumberOfEnabledProcessors);
> >
> > @@ -754,49 +1000,13 @@ CpuFeaturesDetect (
> >     //
> >     // Wakeup all APs for data collection.
> >     //
> > -  StartupAPsWorker (CollectProcessorData);
> > +  StartupAPsWorker (CollectProcessorData, NULL);
> >
> >     //
> >     // Collect data on BSP
> >     //
> > -  CollectProcessorData (NULL);
> > +  CollectProcessorData (CpuFeaturesData);
> >
> >     AnalysisProcessorFeatures (NumberOfCpus);
> >   }
> >
> > -/**
> > -  Performs CPU features Initialization.
> > -
> > -  This service will invoke MP service to perform CPU features
> > -  initialization on BSP/APs per user configuration.
> > -
> > -  @note This service could be called by BSP only.
> > -**/
> > -VOID
> > -EFIAPI
> > -CpuFeaturesInitialize (
> > -  VOID
> > -  )
> > -{
> > -  CPU_FEATURES_DATA      *CpuFeaturesData;
> > -  UINTN                  OldBspNumber;
> > -
> > -  CpuFeaturesData = GetCpuFeaturesData ();
> > -
> > -  OldBspNumber = GetProcessorIndex();
> > -  CpuFeaturesData->BspNumber = OldBspNumber;
> > -  //
> > -  // Wakeup all APs for programming.
> > -  //
> > -  StartupAPsWorker (SetProcessorRegister);
> > -  //
> > -  // Programming BSP
> > -  //
> > -  SetProcessorRegister (NULL);
> > -  //
> > -  // Switch to new BSP if required
> > -  //
> > -  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> > -    SwitchNewBsp (CpuFeaturesData->BspNumber);
> > -  }
> > -}
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> > index 1f34a3f489..8346f7004f 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> > @@ -15,6 +15,7 @@
> >   #include <PiDxe.h>
> >
> >   #include <Library/UefiBootServicesTableLib.h>
> > +#include <Library/UefiLib.h>
> >
> >   #include "RegisterCpuFeatures.h"
> >
> > @@ -115,14 +116,20 @@ GetProcessorInformation (
> >
> >     @param[in]  Procedure               A pointer to the function to be run 
> > on
> >                                         enabled APs of the system.
> > +  @param[in]  MpEvent                 A pointer to the event to be used 
> > later
> > +                                      to check whether procedure has done.
> >   **/
> >   VOID
> >   StartupAPsWorker (
> > -  IN  EFI_AP_PROCEDURE                 Procedure
> > +  IN  EFI_AP_PROCEDURE                 Procedure,
> > +  IN  VOID                             *MpEvent
> >     )
> >   {
> >     EFI_STATUS                           Status;
> >     EFI_MP_SERVICES_PROTOCOL             *MpServices;
> > +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> >
> >     MpServices = GetMpProtocol ();
> >     //
> > @@ -132,9 +139,9 @@ StartupAPsWorker (
> >                    MpServices,
> >                    Procedure,
> >                    FALSE,
> > -                 NULL,
> > +                 (EFI_EVENT)MpEvent,
> >                    0,
> > -                 NULL,
> > +                 CpuFeaturesData,
> >                    NULL
> >                    );
> >     ASSERT_EFI_ERROR (Status);
> > @@ -197,3 +204,61 @@ GetNumberOfProcessor (
> >     ASSERT_EFI_ERROR (Status);
> >   }
> >
> > +/**
> > +  Performs CPU features Initialization.
> > +
> > +  This service will invoke MP service to perform CPU features
> > +  initialization on BSP/APs per user configuration.
> > +
> > +  @note This service could be called by BSP only.
> > +**/
> > +VOID
> > +EFIAPI
> > +CpuFeaturesInitialize (
> > +  VOID
> > +  )
> > +{
> > +  CPU_FEATURES_DATA          *CpuFeaturesData;
> > +  UINTN                      OldBspNumber;
> > +  EFI_EVENT                  MpEvent;
> > +  EFI_STATUS                 Status;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> > +
> > +  OldBspNumber = GetProcessorIndex();
> > +  CpuFeaturesData->BspNumber = OldBspNumber;
> > +
> > +  Status = gBS->CreateEvent (
> > +                  EVT_NOTIFY_WAIT,
> > +                  TPL_CALLBACK,
> > +                  EfiEventEmptyFunction,
> > +                  NULL,
> > +                  &MpEvent
> > +                  );
> > +  ASSERT_EFI_ERROR (Status);
> > +
> > +  //
> > +  // Wakeup all APs for programming.
> > +  //
> > +  StartupAPsWorker (SetProcessorRegister, MpEvent);
> > +  //
> > +  // Programming BSP
> > +  //
> > +  SetProcessorRegister (CpuFeaturesData);
> > +
> > +  //
> > +  // Wait all processors to finish the task.
> > +  //
> > +  do {
> > +    Status = gBS->CheckEvent (MpEvent);
> > +  } while (Status == EFI_NOT_READY);
> > +  ASSERT_EFI_ERROR (Status);
> > +
> > +  //
> > +  // Switch to new BSP if required
> > +  //
> > +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> > +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> > +  }
> > +}
> > +
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> > index f0f317c945..6693bae575 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> > @@ -47,6 +47,9 @@
> >     SynchronizationLib
> >     UefiBootServicesTableLib
> >     IoLib
> > +  UefiBootServicesTableLib
> > +  UefiLib
> > +  LocalApicLib
> >
> >   [Protocols]
> >     gEfiMpServiceProtocolGuid                                            ## 
> > CONSUMES
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> > index 82fe268812..799864a136 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> > @@ -149,11 +149,15 @@ GetProcessorInformation (
> >   **/
> >   VOID
> >   StartupAPsWorker (
> > -  IN  EFI_AP_PROCEDURE                 Procedure
> > +  IN  EFI_AP_PROCEDURE                 Procedure,
> > +  IN  VOID                             *MpEvent
> >     )
> >   {
> >     EFI_STATUS                           Status;
> >     EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
> > +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> >
> >     //
> >     // Get MP Services Protocol
> > @@ -175,7 +179,7 @@ StartupAPsWorker (
> >                    Procedure,
> >                    FALSE,
> >                    0,
> > -                 NULL
> > +                 CpuFeaturesData
> >                    );
> >     ASSERT_EFI_ERROR (Status);
> >   }
> > @@ -257,3 +261,50 @@ GetNumberOfProcessor (
> >                            );
> >     ASSERT_EFI_ERROR (Status);
> >   }
> > +
> > +/**
> > +  Performs CPU features Initialization.
> > +
> > +  This service will invoke MP service to perform CPU features
> > +  initialization on BSP/APs per user configuration.
> > +
> > +  @note This service could be called by BSP only.
> > +**/
> > +VOID
> > +EFIAPI
> > +CpuFeaturesInitialize (
> > +  VOID
> > +  )
> > +{
> > +  CPU_FEATURES_DATA          *CpuFeaturesData;
> > +  UINTN                      OldBspNumber;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> > +
> > +  OldBspNumber = GetProcessorIndex();
> > +  CpuFeaturesData->BspNumber = OldBspNumber;
> > +
> > +  //
> > +  // Known limitation: In PEI phase, CpuFeatures driver not
> > +  // support async mode execute tasks. So semaphore type
> > +  // register can't been used for this instance, must use
> > +  // DXE type instance.
> > +  //
> > +
> > +  //
> > +  // Wakeup all APs for programming.
> > +  //
> > +  StartupAPsWorker (SetProcessorRegister, NULL);
> > +  //
> > +  // Programming BSP
> > +  //
> > +  SetProcessorRegister (CpuFeaturesData);
> > +
> > +  //
> > +  // Switch to new BSP if required
> > +  //
> > +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> > +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> > +  }
> > +}
> > +
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> > index fdfef98293..e95f01df0b 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> > @@ -49,6 +49,7 @@
> >     PeiServicesLib
> >     PeiServicesTablePointerLib
> >     IoLib
> > +  LocalApicLib
> >
> >   [Ppis]
> >     gEfiPeiMpServicesPpiGuid                                             ## 
> > CONSUMES
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> > index edd266934f..39457e9730 100644
> > --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> > +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> > @@ -23,6 +23,7 @@
> >   #include <Library/MemoryAllocationLib.h>
> >   #include <Library/SynchronizationLib.h>
> >   #include <Library/IoLib.h>
> > +#include <Library/LocalApicLib.h>
> >
> >   #include <AcpiCpuData.h>
> >
> > @@ -46,16 +47,26 @@ typedef struct {
> >     CPU_FEATURE_INITIALIZE       InitializeFunc;
> >     UINT8                        *BeforeFeatureBitMask;
> >     UINT8                        *AfterFeatureBitMask;
> > +  UINT8                        *CoreBeforeFeatureBitMask;
> > +  UINT8                        *CoreAfterFeatureBitMask;
> > +  UINT8                        *PackageBeforeFeatureBitMask;
> > +  UINT8                        *PackageAfterFeatureBitMask;
> >     VOID                         *ConfigData;
> >     BOOLEAN                      BeforeAll;
> >     BOOLEAN                      AfterAll;
> >   } CPU_FEATURES_ENTRY;
> >
> > +//
> > +// Flags used when program the register.
> > +//
> > +typedef struct {
> > +  volatile UINTN           MemoryMappedLock;     // Spinlock used to 
> > program
> mmio
> > +  volatile UINT32          *SemaphoreCount;      // Semaphore used to
> program semaphore.
> > +} PROGRAM_CPU_REGISTER_FLAGS;
> > +
> >   typedef struct {
> >     UINTN                    FeaturesCount;
> >     UINT32                   BitMaskSize;
> > -  SPIN_LOCK                MsrLock;
> > -  SPIN_LOCK                MemoryMappedLock;
> >     LIST_ENTRY               FeatureList;
> >
> >     CPU_FEATURES_INIT_ORDER  *InitOrder;
> > @@ -64,9 +75,14 @@ typedef struct {
> >     UINT8                    *ConfigurationPcd;
> >     UINT8                    *SettingPcd;
> >
> > +  UINT32                   NumberOfCpus;
> > +  ACPI_CPU_DATA            *AcpiCpuData;
> > +
> >     CPU_REGISTER_TABLE       *RegisterTable;
> >     CPU_REGISTER_TABLE       *PreSmmRegisterTable;
> >     UINTN                    BspNumber;
> > +
> > +  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
> >   } CPU_FEATURES_DATA;
> >
> >   #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
> > @@ -118,10 +134,13 @@ GetProcessorInformation (
> >
> >     @param[in]  Procedure               A pointer to the function to be run 
> > on
> >                                         enabled APs of the system.
> > +  @param[in]  MpEvent                 A pointer to the event to be used 
> > later
> > +                                      to check whether procedure has done.
> >   **/
> >   VOID
> >   StartupAPsWorker (
> > -  IN  EFI_AP_PROCEDURE                 Procedure
> > +  IN  EFI_AP_PROCEDURE                 Procedure,
> > +  IN  VOID                             *MpEvent
> >     );
> >
> >   /**
> > @@ -170,4 +189,30 @@ DumpCpuFeature (
> >     IN CPU_FEATURES_ENTRY  *CpuFeature
> >     );
> >
> > +/**
> > +  Return feature dependence result.
> > +
> > +  @param[in]  CpuFeature        Pointer to CPU feature.
> > +  @param[in]  Before            Check before dependence or after.
> > +
> > +  @retval     return the dependence result.
> > +**/
> > +CPU_FEATURE_DEPENDENCE_TYPE
> > +DetectFeatureScope (
> > +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> > +  IN BOOLEAN                    Before
> > +  );
> > +
> > +/**
> > +  Programs registers for the calling processor.
> > +
> > +  @param[in,out] Buffer  The pointer to private data buffer.
> > +
> > +**/
> > +VOID
> > +EFIAPI
> > +SetProcessorRegister (
> > +  IN OUT VOID            *Buffer
> > +  );
> > +
> >   #endif
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> > index fa7e107e39..f9e3178dc1 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> > @@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
> >     return FALSE;
> >   }
> >
> > +/**
> > +  Return feature dependence result.
> > +
> > +  @param[in]  CpuFeature        Pointer to CPU feature.
> > +  @param[in]  Before            Check before dependence or after.
> > +
> > +  @retval     return the dependence result.
> > +**/
> > +CPU_FEATURE_DEPENDENCE_TYPE
> > +DetectFeatureScope (
> > +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> > +  IN BOOLEAN                    Before
> > +  )
> > +{
> > +  if (Before) {
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      return PackageDepType;
> > +    }
> > +
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      return CoreDepType;
> > +    }
> > +
> > +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> > +      return ThreadDepType;
> > +    }
> > +
> > +    return NoneDepType;
> > +  }
> > +
> > +  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +    return PackageDepType;
> > +  }
> > +
> > +  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +    return CoreDepType;
> > +  }
> > +
> > +  if (CpuFeature->AfterFeatureBitMask != NULL) {
> > +    return ThreadDepType;
> > +  }
> > +
> > +  return NoneDepType;
> > +}
> > +
> > +/**
> > +  Clear dependence for the specified type.
> > +
> > +  @param[in]  CurrentFeature     Cpu feature need to clear.
> > +  @param[in]  Before             Before or after dependence relationship.
> > +
> > +**/
> > +VOID
> > +ClearFeatureScope (
> > +  IN CPU_FEATURES_ENTRY           *CpuFeature,
> > +  IN BOOLEAN                      Before
> > +  )
> > +{
> > +  if (Before) {
> > +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->BeforeFeatureBitMask);
> > +      CpuFeature->BeforeFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
> > +      CpuFeature->CoreBeforeFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
> > +      CpuFeature->PackageBeforeFeatureBitMask = NULL;
> > +    }
> > +  } else {
> > +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->PackageAfterFeatureBitMask);
> > +      CpuFeature->PackageAfterFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->CoreAfterFeatureBitMask);
> > +      CpuFeature->CoreAfterFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->AfterFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->AfterFeatureBitMask);
> > +      CpuFeature->AfterFeatureBitMask = NULL;
> > +    }
> > +  }
> > +}
> > +
> > +/**
> > +  Base on dependence relationship to asjust feature dependence.
> > +
> > +  ONLY when the feature before(or after) the find feature also has
> > +  dependence with the find feature. In this case, driver need to base
> > +  on dependce relationship to decide how to insert current feature and
> > +  adjust the feature dependence.
> > +
> > +  @param[in]  PreviousFeature    CPU feature current before the find one.
> > +  @param[in]  CurrentFeature     Cpu feature need to adjust.
> > +  @param[in]  Before             Before or after dependence relationship.
> > +
> > +  @retval   TRUE   means the current feature dependence has been
> adjusted.
> > +
> > +  @retval   FALSE  means the previous feature dependence has been
> adjusted.
> > +                   or previous feature has no dependence with the find one.
> > +
> > +**/
> > +BOOLEAN
> > +AdjustFeaturesDependence (
> > +  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
> > +  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
> > +  IN     BOOLEAN                    Before
> > +  )
> > +{
> > +  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
> > +  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
> > +
> > +  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
> > +  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
> > +
> > +  //
> > +  // If previous feature has no dependence with the find featue.
> > +  // return FALSE.
> > +  //
> > +  if (PreDependType == NoneDepType) {
> > +    return FALSE;
> > +  }
> > +
> > +  //
> > +  // If both feature have dependence, keep the one which needs use
> more
> > +  // processors and clear the dependence for the other one.
> > +  //
> > +  if (PreDependType >= CurrentDependType) {
> > +    ClearFeatureScope (CurrentFeature, Before);
> > +    return TRUE;
> > +  } else {
> > +    ClearFeatureScope (PreviousFeature, Before);
> > +    return FALSE;
> > +  }
> > +}
> > +
> > +/**
> > +  Base on dependence relationship to asjust feature order.
> > +
> > +  @param[in]  FeatureList        Pointer to CPU feature list
> > +  @param[in]  FindEntry          The entry this feature depend on.
> > +  @param[in]  CurrentEntry       The entry for this feature.
> > +  @param[in]  Before             Before or after dependence relationship.
> > +
> > +**/
> > +VOID
> > +AdjustEntry (
> > +  IN      LIST_ENTRY                *FeatureList,
> > +  IN OUT  LIST_ENTRY                *FindEntry,
> > +  IN OUT  LIST_ENTRY                *CurrentEntry,
> > +  IN      BOOLEAN                   Before
> > +  )
> > +{
> > +  LIST_ENTRY                *PreviousEntry;
> > +  CPU_FEATURES_ENTRY        *PreviousFeature;
> > +  CPU_FEATURES_ENTRY        *CurrentFeature;
> > +
> > +  //
> > +  // For CPU feature which has core or package type dependence, later
> code need to insert
> > +  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
> > +  // So if driver finds both feature A and B need to execute before feature
> C, driver will
> > +  // base on dependence type of feature A and B to update the logic here.
> > +  // For example, feature A has package type dependence and feature B
> has core type dependence,
> > +  // because package type dependence need to wait for more processors
> which has strong dependence
> > +  // than core type dependence. So driver will adjust the feature order to
> B -> A -> C. and driver
> > +  // will remove the feature dependence in feature B.
> > +  // Driver just needs to make sure before feature C been executed,
> feature A has finished its task
> > +  // in all all thread. Feature A finished in all threads also means 
> > feature B
> have finshed in all
> > +  // threads.
> > +  //
> > +  if (Before) {
> > +    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
> > +  } else {
> >
> > +    PreviousEntry = GetNextNode (FeatureList, FindEntry);
> > +  }
> > +
> > +  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> > +  RemoveEntryList (CurrentEntry);
> > +
> > +  if (IsNull (FeatureList, PreviousEntry)) {
> > +    //
> > +    // If not exist the previous or next entry, just insert the current 
> > entry.
> > +    //
> > +    if (Before) {
> > +      InsertTailList (FindEntry, CurrentEntry);
> > +    } else {
> > +      InsertHeadList (FindEntry, CurrentEntry);
> > +    }
> > +  } else {
> > +    //
> > +    // If exist the previous or next entry, need to check it before insert
> curent entry.
> > +    //
> > +    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
> > +
> > +    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature,
> Before)) {
> > +      //
> > +      // Return TRUE means current feature dependence has been cleared
> and the previous
> > +      // feature dependence has been kept and used. So insert current
> feature before (or after)
> > +      // the previous feature.
> > +      //
> > +      if (Before) {
> > +        InsertTailList (PreviousEntry, CurrentEntry);
> > +      } else {
> > +        InsertHeadList (PreviousEntry, CurrentEntry);
> > +      }
> > +    } else {
> > +      if (Before) {
> > +        InsertTailList (FindEntry, CurrentEntry);
> > +      } else {
> > +        InsertHeadList (FindEntry, CurrentEntry);
> > +      }
> > +    }
> > +  }
> > +}
> >
> > +
> > +/**
> > +  Checks and adjusts current CPU features per dependency relationship.
> > +
> > +  @param[in]  FeatureList        Pointer to CPU feature list
> > +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> > +  @param[in]  FeatureMask        The feature bit mask.
> > +
> > +  @retval     return Swapped info.
> > +**/
> > +BOOLEAN
> > +InsertToBeforeEntry (
> > +  IN LIST_ENTRY              *FeatureList,
> > +  IN LIST_ENTRY              *CurrentEntry,
> > +  IN UINT8                   *FeatureMask
> > +  )
> > +{
> > +  LIST_ENTRY                 *CheckEntry;
> > +  CPU_FEATURES_ENTRY         *CheckFeature;
> > +  BOOLEAN                    Swapped;
> > +
> > +  Swapped = FALSE;
> > +
> > +  //
> > +  // Check all features dispatched before this entry
> > +  //
> > +  CheckEntry = GetFirstNode (FeatureList);
> > +  while (CheckEntry != CurrentEntry) {
> > +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask))
> {
> > +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
> > +      Swapped = TRUE;
> > +      break;
> > +    }
> > +    CheckEntry = CheckEntry->ForwardLink;
> > +  }
> > +
> > +  return Swapped;
> > +}
> > +
> > +/**
> > +  Checks and adjusts current CPU features per dependency relationship.
> > +
> > +  @param[in]  FeatureList        Pointer to CPU feature list
> > +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> > +  @param[in]  FeatureMask        The feature bit mask.
> > +
> > +  @retval     return Swapped info.
> > +**/
> > +BOOLEAN
> > +InsertToAfterEntry (
> > +  IN LIST_ENTRY              *FeatureList,
> > +  IN LIST_ENTRY              *CurrentEntry,
> > +  IN UINT8                   *FeatureMask
> > +  )
> > +{
> > +  LIST_ENTRY                 *CheckEntry;
> > +  CPU_FEATURES_ENTRY         *CheckFeature;
> > +  BOOLEAN                    Swapped;
> > +
> > +  Swapped = FALSE;
> > +
> > +  //
> > +  // Check all features dispatched after this entry
> > +  //
> > +  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> > +  while (!IsNull (FeatureList, CheckEntry)) {
> > +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask))
> {
> > +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
> > +      Swapped = TRUE;
> > +      break;
> > +    }
> > +    CheckEntry = CheckEntry->ForwardLink;
> > +  }
> > +
> > +  return Swapped;
> > +}
> > +
> >   /**
> >     Checks and adjusts CPU features order per dependency relationship.
> >
> > @@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
> >     CPU_FEATURES_ENTRY         *CheckFeature;
> >     BOOLEAN                    Swapped;
> >     LIST_ENTRY                 *TempEntry;
> > +  LIST_ENTRY                 *NextEntry;
> >
> >     CurrentEntry = GetFirstNode (FeatureList);
> >     while (!IsNull (FeatureList, CurrentEntry)) {
> >       Swapped = FALSE;
> >       CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> > +    NextEntry = CurrentEntry->ForwardLink;
> >       if (CpuFeature->BeforeAll) {
> >         //
> >         // Check all features dispatched before this entry
> > @@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
> >           CheckEntry = CheckEntry->ForwardLink;
> >         }
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> > @@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
> >           CheckEntry = CheckEntry->ForwardLink;
> >         }
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> >
> >       if (CpuFeature->BeforeFeatureBitMask != NULL) {
> > -      //
> > -      // Check all features dispatched before this entry
> > -      //
> > -      CheckEntry = GetFirstNode (FeatureList);
> > -      while (CheckEntry != CurrentEntry) {
> > -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature-
> >BeforeFeatureBitMask)) {
> > -          //
> > -          // If there is dependency, swap them
> > -          //
> > -          RemoveEntryList (CurrentEntry);
> > -          InsertTailList (CheckEntry, CurrentEntry);
> > -          Swapped = TRUE;
> > -          break;
> > -        }
> > -        CheckEntry = CheckEntry->ForwardLink;
> > -      }
> > +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry,
> CpuFeature->BeforeFeatureBitMask);
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> >
> >       if (CpuFeature->AfterFeatureBitMask != NULL) {
> > -      //
> > -      // Check all features dispatched after this entry
> > -      //
> > -      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> > -      while (!IsNull (FeatureList, CheckEntry)) {
> > -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature-
> >AfterFeatureBitMask)) {
> > -          //
> > -          // If there is dependency, swap them
> > -          //
> > -          TempEntry = GetNextNode (FeatureList, CurrentEntry);
> > -          RemoveEntryList (CurrentEntry);
> > -          InsertHeadList (CheckEntry, CurrentEntry);
> > -          CurrentEntry = TempEntry;
> > -          Swapped = TRUE;
> > -          break;
> > -        }
> > -        CheckEntry = CheckEntry->ForwardLink;
> > +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature-
> >AfterFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> >         }
> > +    }
> > +
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry,
> CpuFeature->CoreBeforeFeatureBitMask);
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> > -    //
> > -    // No swap happened, check the next feature
> > -    //
> > +
> > +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature-
> >CoreAfterFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> > +      }
> > +    }
> > +
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry,
> CpuFeature->PackageBeforeFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> > +      }
> > +    }
> > +
> > +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature-
> >PackageAfterFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> > +      }
> > +    }
> > +
> >       CurrentEntry = CurrentEntry->ForwardLink;
> >     }
> >   }
> > @@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
> >     CpuFeaturesData = GetCpuFeaturesData ();
> >     if (CpuFeaturesData->FeaturesCount == 0) {
> >       InitializeListHead (&CpuFeaturesData->FeatureList);
> > -    InitializeSpinLock (&CpuFeaturesData->MsrLock);
> > -    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
> > +    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
> >       CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
> >     }
> >     ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
> > @@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
> >         }
> >         CpuFeatureEntry->AfterFeatureBitMask = CpuFeature-
> >AfterFeatureBitMask;
> >       }
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->CoreBeforeFeatureBitMask = CpuFeature-
> >CoreBeforeFeatureBitMask;
> > +    }
> > +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->CoreAfterFeatureBitMask = CpuFeature-
> >CoreAfterFeatureBitMask;
> > +    }
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->PackageBeforeFeatureBitMask = CpuFeature-
> >PackageBeforeFeatureBitMask;
> > +    }
> > +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->PackageAfterFeatureBitMask = CpuFeature-
> >PackageAfterFeatureBitMask;
> > +    }
> > +
> >       CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
> >       CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
> >
> > @@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
> >     @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is
> not
> >                                       supported due to a circular 
> > dependency between
> >                                       BEFORE and AFTER features.
> > +  @retval  RETURN_NOT_READY         CPU feature PCD
> PcdCpuFeaturesUserConfiguration
> > +                                    not updated by Platform driver yet.
> >
> >     @note This service could be called by BSP only.
> >   **/
> > @@ -431,12 +755,20 @@ RegisterCpuFeature (
> >     UINT8                      *FeatureMask;
> >     UINT8                      *BeforeFeatureBitMask;
> >     UINT8                      *AfterFeatureBitMask;
> > +  UINT8                      *CoreBeforeFeatureBitMask;
> > +  UINT8                      *CoreAfterFeatureBitMask;
> > +  UINT8                      *PackageBeforeFeatureBitMask;
> > +  UINT8                      *PackageAfterFeatureBitMask;
> >     BOOLEAN                    BeforeAll;
> >     BOOLEAN                    AfterAll;
> >
> > -  FeatureMask          = NULL;
> > -  BeforeFeatureBitMask = NULL;
> > -  AfterFeatureBitMask  = NULL;
> > +  FeatureMask                 = NULL;
> > +  BeforeFeatureBitMask        = NULL;
> 
> How about renaming BeforeFeatureBitMask to
> ThreadBeforeFeatureBitMask?
> I think the renaming together with redefining the macro
> CPU_FEATURE_BEFORE as CPU_FEATURE_THREAD_BEFORE can be in a
> separate patch.
> 

Ok, will separate the patch in next version changes.

> > +  AfterFeatureBitMask         = NULL;
> > +  CoreBeforeFeatureBitMask    = NULL;
> > +  CoreAfterFeatureBitMask     = NULL;
> > +  PackageBeforeFeatureBitMask  = NULL;
> > +  PackageAfterFeatureBitMask   = NULL;
> >     BeforeAll            = FALSE;
> >     AfterAll             = FALSE;
> >
> > @@ -449,6 +781,10 @@ RegisterCpuFeature (
> >                       != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
> >       ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL |
> CPU_FEATURE_AFTER_ALL))
> >                       != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));
> 
> Implementation can avoid using CPU_FEATURE_BEFORE and
> CPU_FEATURE_AFTER.
> Use CPU_FEATURE_THREAD_BEFORE and CPU_FEATURE_THREAD_AFTER.

Ok, will do this change in the separate patch in next version changes.


> 
> > +    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE |
> CPU_FEATURE_CORE_AFTER))
> > +                    != (CPU_FEATURE_CORE_BEFORE |
> CPU_FEATURE_CORE_AFTER));
> > +    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE |
> CPU_FEATURE_PACKAGE_AFTER))
> > +                    != (CPU_FEATURE_PACKAGE_BEFORE |
> CPU_FEATURE_PACKAGE_AFTER));
> >       if (Feature < CPU_FEATURE_BEFORE) {
> >         BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE :
> FALSE;
> >         AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
> > @@ -459,6 +795,14 @@ RegisterCpuFeature (
> >         SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature &
> ~CPU_FEATURE_BEFORE, BitMaskSize);
> >       } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
> >         SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature &
> ~CPU_FEATURE_AFTER, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
> > +      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature &
> ~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
> > +      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature &
> ~CPU_FEATURE_CORE_AFTER, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
> > +      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature &
> ~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
> > +      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature &
> ~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
> >       }
> >       Feature = VA_ARG (Marker, UINT32);
> >     }
> > @@ -466,15 +810,19 @@ RegisterCpuFeature (
> >
> >     CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
> >     ASSERT (CpuFeature != NULL);
> > -  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
> > -  CpuFeature->FeatureMask          = FeatureMask;
> > -  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
> > -  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
> > -  CpuFeature->BeforeAll            = BeforeAll;
> > -  CpuFeature->AfterAll             = AfterAll;
> > -  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
> > -  CpuFeature->SupportFunc          = SupportFunc;
> > -  CpuFeature->InitializeFunc       = InitializeFunc;
> > +  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
> > +  CpuFeature->FeatureMask                 = FeatureMask;
> > +  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
> > +  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
> > +  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
> > +  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
> > +  CpuFeature->PackageBeforeFeatureBitMask =
> PackageBeforeFeatureBitMask;
> > +  CpuFeature->PackageAfterFeatureBitMask  =
> PackageAfterFeatureBitMask;
> > +  CpuFeature->BeforeAll                   = BeforeAll;
> > +  CpuFeature->AfterAll                    = AfterAll;
> > +  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
> > +  CpuFeature->SupportFunc                 = SupportFunc;
> > +  CpuFeature->InitializeFunc              = InitializeFunc;
> >     if (FeatureName != NULL) {
> >       CpuFeature->FeatureName          = AllocatePool
> (CPU_FEATURE_NAME_SIZE);
> >       ASSERT (CpuFeature->FeatureName != NULL);
> >
> 
> 
> --
> Thanks,
> Ray
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to