Re: [edk2] [Patch v2 0/6] Fix performance issue caused by Set MSR task.

2018-10-18 Thread Dong, Eric
Hi Laszlo,

Thanks for your notes. I have updated V2 changes based on Ray's comments. 
Please help to check the V3 changes.

Thanks,
Eric

> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: Thursday, October 18, 2018 1:34 AM
> To: Dong, Eric 
> Cc: Ni, Ruiyu ; edk2-devel-01 
> Subject: Re: [edk2] [Patch v2 0/6] Fix performance issue caused by Set MSR
> task.
> 
> Hi Eric,
> 
> On 10/17/18 04:16, Eric Dong wrote:
> > V2 changes include:
> > 1. Include the change for CpuCommonFeaturesLib which used to set MSR
> base on its scope info.
> > 2. Include the change for CpuS3DataDxe driver which also handle the
> AcpiCpuData data.
> > 3. Update code base on feedback for V1 changes.
> 
> reviewing this version of the series is on my TODO list. I was out-of-office
> yesterday (somewhat unexpectedly) and this morning I had
> 130 emails just in my inbox (not counting other list traffic).
> 
> I've more or less managed to get to the bottom of that mail dump by now,
> but as a result, it's too late to start reviewing your v2 still today.
> Hopefully I can cover it tomorrow.
> 
> (I might fetch another email batch this evening, and handle small items.
> Thus, you could see further emails from me on the list tonight.)
> 
> If Ray reviewed your v2 today (while I was busy unburying myself from my
> inbox), and you are ready to post a v3 based on Ray's comments, please do
> so. Then I'll skip v2 and catch up with v3.
> 
> Thanks!
> Laszlo
___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


Re: [edk2] [Patch v2 0/6] Fix performance issue caused by Set MSR task.

2018-10-17 Thread Dong, Eric
Hi Ruiyu,

Below link with my changes:

https://github.com/ydong10/edk2/tree/MSR

Thanks,
Eric

> -Original Message-
> From: Ni, Ruiyu
> Sent: Thursday, October 18, 2018 10:13 AM
> To: Dong, Eric ; edk2-devel@lists.01.org
> Cc: Laszlo Ersek 
> Subject: RE: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
> 
> Eric,
> Can you post your changes to github yours mirror repo?
> I found #3/6 cannot be applied to my code properly.
> 
> Thanks/Ray
> 
> > -Original Message-
> > From: Dong, Eric
> > Sent: Wednesday, October 17, 2018 10:16 AM
> > To: edk2-devel@lists.01.org
> > Cc: Ni, Ruiyu ; Laszlo Ersek 
> > Subject: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
> >
> > V2 changes include:
> > 1. Include the change for CpuCommonFeaturesLib which used to set MSR
> > base on its scope info.
> > 2. Include the change for CpuS3DataDxe driver which also handle the
> > AcpiCpuData data.
> > 3. Update code base on feedback for V1 changes.
> >
> > V1 changes include:
> > In a system which has multiple cores, current set register value task
> > costs huge times.
> > After investigation, current set MSR task costs most of the times.
> > Current logic uses SpinLock to let set MSR task as an single thread task for
> all cores.
> > Because MSR has scope attribute which may cause GP fault if multiple
> > APs set MSR at the same time, current logic use an easiest solution
> > (use SpinLock) to avoid this issue, but it will cost huge times.
> >
> > In order to fix this performance issue, new solution will set MSRs
> > base on their scope attribute. After this, the SpinLock will not
> > needed. Without SpinLock, new issue raised which is caused by MSR
> > dependence. For example, MSR A depends on MSR B which means MSR A
> must
> > been set after MSR B has been set. Also MSR B is package scope level
> > and MSR A is thread scope level. If system has multiple threads,
> > Thread 1 needs to set the thread level MSRs and thread 2 needs to set
> > thread and package level MSRs. Set MSRs task for thread 1 and thread 2
> like below:
> >
> > Thread 1 Thread 2
> > MSR B  NY
> > MSR A  YY
> >
> > If driver don't control execute MSR order, for thread 1, it will
> > execute MSR A first, but at this time, MSR B not been executed yet by
> > thread 2. system may trig exception at this time.
> >
> > In order to fix the above issue, driver introduces semaphore logic to
> > control the MSR execute sequence. For the above case, a semaphore will
> > be add between MSR A and B for all threads. Semaphore has scope info
> > for it. The possible scope value is core or package.
> > For each thread, when it meets a semaphore during it set registers, it
> > will 1) release semaphore (+1) for each threads in this core or
> > package(based on the scope info for this
> > semaphore) 2) acquire semaphore (-1) for all the threads in this core
> > or package(based on the scope info for this semaphore). With these two
> > steps, driver can control MSR sequence. Sample code logic like below:
> >
> >   //
> >   // First increase semaphore count by 1 for processors in this package.
> >   //
> >   for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> > ProcessorIndex ++) {
> > LibReleaseSemaphore ((UINT32 *) [PackageOffset +
> > ProcessorIndex]);
> >   }
> >   //
> >   // Second, check whether the count has reach the check number.
> >   //
> >   for (ProcessorIndex = 0; ProcessorIndex < ValidApCount;
> > ProcessorIndex ++) {
> > LibWaitForSemaphore ([ApOffset]);
> >   }
> >
> > Platform Requirement:
> > 1. This change requires register MSR setting base on MSR scope info.
> > If still register MSR
> >for all threads, exception may raised.
> >
> > Known limitation:
> > 1. Current CpuFeatures driver supports DXE instance and PEI instance.
> > But semaphore logic
> >requires Aps execute in async mode which is not supported by PEI driver.
> > So CpuFeature
> >PEI instance not works after this change. We plan to support async
> > mode for PEI in phase
> >2 for this task.
> > 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm
> > driver and
> >RegisterCpuFeaturesLib library because the schedule limitation.
> > Will merge the code to
> >RegisterCpuFeaturesLib and export as an API in phase 2 for this task.
> >
> > Cc: Ruiyu Ni 
> > Cc: Laszlo Ersek 
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong 
> >
> >
> > Eric Dong (6):
> >   UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
> >   UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
> >   UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
> > type.
> >   UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
> >   UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
> >   UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on 

Re: [edk2] [Patch v2 0/6] Fix performance issue caused by Set MSR task.

2018-10-17 Thread Ni, Ruiyu
Eric,
Can you post your changes to github yours mirror repo?
I found #3/6 cannot be applied to my code properly.

Thanks/Ray

> -Original Message-
> From: Dong, Eric
> Sent: Wednesday, October 17, 2018 10:16 AM
> To: edk2-devel@lists.01.org
> Cc: Ni, Ruiyu ; Laszlo Ersek 
> Subject: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
> 
> V2 changes include:
> 1. Include the change for CpuCommonFeaturesLib which used to set MSR
> base on its scope info.
> 2. Include the change for CpuS3DataDxe driver which also handle the
> AcpiCpuData data.
> 3. Update code base on feedback for V1 changes.
> 
> V1 changes include:
> In a system which has multiple cores, current set register value task costs
> huge times.
> After investigation, current set MSR task costs most of the times. Current
> logic uses SpinLock to let set MSR task as an single thread task for all 
> cores.
> Because MSR has scope attribute which may cause GP fault if multiple APs
> set MSR at the same time, current logic use an easiest solution (use SpinLock)
> to avoid this issue, but it will cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on
> their scope attribute. After this, the SpinLock will not needed. Without
> SpinLock, new issue raised which is caused by MSR dependence. For
> example, MSR A depends on MSR B which means MSR A must been set after
> MSR B has been set. Also MSR B is package scope level and MSR A is thread
> scope level. If system has multiple threads, Thread 1 needs to set the thread
> level MSRs and thread 2 needs to set thread and package level MSRs. Set
> MSRs task for thread 1 and thread 2 like below:
> 
> Thread 1 Thread 2
> MSR B  NY
> MSR A  YY
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A
> first, but at this time, MSR B not been executed yet by thread 2. system may
> trig exception at this time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control
> the MSR execute sequence. For the above case, a semaphore will be add
> between MSR A and B for all threads. Semaphore has scope info for it. The
> possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1)
> release semaphore (+1) for each threads in this core or package(based on
> the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or
> package(based on the scope info for this semaphore). With these two steps,
> driver can control MSR sequence. Sample code logic like below:
> 
>   //
>   // First increase semaphore count by 1 for processors in this package.
>   //
>   for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> LibReleaseSemaphore ((UINT32 *) [PackageOffset +
> ProcessorIndex]);
>   }
>   //
>   // Second, check whether the count has reach the check number.
>   //
>   for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++)
> {
> LibWaitForSemaphore ([ApOffset]);
>   }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still
> register MSR
>for all threads, exception may raised.
> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But
> semaphore logic
>requires Aps execute in async mode which is not supported by PEI driver.
> So CpuFeature
>PEI instance not works after this change. We plan to support async mode
> for PEI in phase
>2 for this task.
> 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver
> and
>RegisterCpuFeaturesLib library because the schedule limitation. Will merge
> the code to
>RegisterCpuFeaturesLib and export as an API in phase 2 for this task.
> 
> Cc: Ruiyu Ni 
> Cc: Laszlo Ersek 
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong 
> 
> 
> Eric Dong (6):
>   UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
>   UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
>   UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
> type.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
>   UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
>   UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info.
> 
>  UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c|   2 +
>  UefiCpuPkg/Include/AcpiCpuData.h   |  45 +-
>  .../Include/Library/RegisterCpuFeaturesLib.h   |  25 +-
>  UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c  |   8 +
>  UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c |  12 +
>  .../Library/CpuCommonFeaturesLib/ExecuteDisable.c  |  10 +
>  .../Library/CpuCommonFeaturesLib/FastStrings.c |  12 +
>  .../Library/CpuCommonFeaturesLib/FeatureControl.c  |  38 ++
>  

Re: [edk2] [Patch v2 0/6] Fix performance issue caused by Set MSR task.

2018-10-17 Thread Laszlo Ersek
Hi Eric,

On 10/17/18 04:16, Eric Dong wrote:
> V2 changes include:
> 1. Include the change for CpuCommonFeaturesLib which used to set MSR base on 
> its scope info.
> 2. Include the change for CpuS3DataDxe driver which also handle the 
> AcpiCpuData data.
> 3. Update code base on feedback for V1 changes.

reviewing this version of the series is on my TODO list. I was
out-of-office yesterday (somewhat unexpectedly) and this morning I had
130 emails just in my inbox (not counting other list traffic).

I've more or less managed to get to the bottom of that mail dump by now,
but as a result, it's too late to start reviewing your v2 still today.
Hopefully I can cover it tomorrow.

(I might fetch another email batch this evening, and handle small items.
Thus, you could see further emails from me on the list tonight.)

If Ray reviewed your v2 today (while I was busy unburying myself from my
inbox), and you are ready to post a v3 based on Ray's comments, please
do so. Then I'll skip v2 and catch up with v3.

Thanks!
Laszlo
___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


[edk2] [Patch v2 0/6] Fix performance issue caused by Set MSR task.

2018-10-16 Thread Eric Dong
V2 changes include:
1. Include the change for CpuCommonFeaturesLib which used to set MSR base on 
its scope info.
2. Include the change for CpuS3DataDxe driver which also handle the AcpiCpuData 
data.
3. Update code base on feedback for V1 changes.

V1 changes include:
In a system which has multiple cores, current set register value task costs 
huge times.
After investigation, current set MSR task costs most of the times. Current 
logic uses SpinLock to let set MSR task as an single thread task for all cores. 
Because MSR has scope attribute which may cause GP fault if multiple APs set 
MSR at the same time, current logic use an easiest solution (use SpinLock) to 
avoid this issue, but it will cost huge times.

In order to fix this performance issue, new solution will set MSRs base on 
their scope attribute. After this, the SpinLock will not needed. Without 
SpinLock, new issue raised which is caused by MSR dependence. For example, MSR 
A depends on MSR B which means MSR A must been set after MSR B has been set. 
Also MSR B is package scope level and MSR A is thread scope level. If system 
has multiple threads, Thread 1 needs to set the thread level MSRs and thread 2 
needs to set thread and package level MSRs. Set MSRs task for thread 1 and 
thread 2 like below:

Thread 1 Thread 2
MSR B  NY
MSR A  YY

If driver don't control execute MSR order, for thread 1, it will execute MSR A 
first, but at this time, MSR B not been executed yet by thread 2. system may 
trig exception at this time.

In order to fix the above issue, driver introduces semaphore logic to control 
the MSR execute sequence. For the above case, a semaphore will be add between 
MSR A and B for all threads. Semaphore has scope info for it. The possible 
scope value is core or package.
For each thread, when it meets a semaphore during it set registers, it will 1) 
release semaphore (+1) for each threads in this core or package(based on the 
scope info for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or 
package(based on the scope info for this semaphore). With these two steps, 
driver can control MSR sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; 
ProcessorIndex ++) {
LibReleaseSemaphore ((UINT32 *) [PackageOffset + 
ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
LibWaitForSemaphore ([ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still 
register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But 
semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So 
CpuFeature
   PEI instance not works after this change. We plan to support async mode for 
PEI in phase
   2 for this task.
2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver and 
   RegisterCpuFeaturesLib library because the schedule limitation. Will merge 
the code to 
   RegisterCpuFeaturesLib and export as an API in phase 2 for this task.

Cc: Ruiyu Ni 
Cc: Laszlo Ersek 
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong 


Eric Dong (6):
  UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
  UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
type.
  UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
  UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info.

 UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c|   2 +
 UefiCpuPkg/Include/AcpiCpuData.h   |  45 +-
 .../Include/Library/RegisterCpuFeaturesLib.h   |  25 +-
 UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c  |   8 +
 UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c |  12 +
 .../Library/CpuCommonFeaturesLib/ExecuteDisable.c  |  10 +
 .../Library/CpuCommonFeaturesLib/FastStrings.c |  12 +
 .../Library/CpuCommonFeaturesLib/FeatureControl.c  |  38 ++
 .../CpuCommonFeaturesLib/LimitCpuIdMaxval.c|  14 +
 .../Library/CpuCommonFeaturesLib/MachineCheck.c|  38 ++
 .../Library/CpuCommonFeaturesLib/MonitorMwait.c|  15 +
 .../Library/CpuCommonFeaturesLib/PendingBreak.c|  11 +
 UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c |  11 +
 .../Library/CpuCommonFeaturesLib/ProcTrace.c   |  11 +
 UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c   |  10 +
 .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 438 ---
 .../DxeRegisterCpuFeaturesLib.c