from:"Yang Zhong"

Re: [PULL 00/68] i386, build system, KVM changes for 2023-05-18

2023-05-22 Thread Yang Zhong

On Fri, May 19, 2023 at 10:29:47AM +0200, Paolo Bonzini wrote:
> On 5/19/23 05:06, Yang Zhong wrote:
> > 
> > Paolo, please help add below queued sgx fix into this PULL request, which 
> > was
> > missed from last time, thanks a lot!
> > https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg00841.html
> > https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg00896.html
> 
> Isn't this commit 72497cff896fecf74306ed33626c30e43633cdd6?
> 
> Author: Yang Zhong 
> Date:   Thu Apr 6 02:40:41 2023 -0400
> 
> target/i386: Change wrong XFRM value in SGX CPUID leaf
> The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
> FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):{ECX,EDX}.  As a result,
> SGX enclaves only supported SSE and x87 feature (xfrm=0x3).
> Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based 
> features")
> Signed-off-by: Yang Zhong 
> Reviewed-by: Yang Weijiang 
> Reviewed-by: Kai Huang 
> Message-Id: <20230406064041.420039-1-yang.zh...@linux.intel.com>
> Signed-off-by: Paolo Bonzini 
>

  Oh, So sorry for this,,, it's my fault.
  
  I missed PULL email information for this patch,
  Apr 21 Paolo Bonzini   (1.4K) [PULL 16/25] target/i386: Change wrong XFRM 
value in SGX CPUID leaf

  Regards,
  Yang

> Paolo
> 
>

Re: [PULL 00/68] i386, build system, KVM changes for 2023-05-18

2023-05-18 Thread Yang Zhong



Paolo, please help add below queued sgx fix into this PULL request, which was
missed from last time, thanks a lot!
https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg00841.html
https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg00896.html

Regards,
Yang

Re: [PATCH v3] target/i386: Change wrong XFRM value

2023-04-09 Thread Yang Zhong

On Sun, Apr 09, 2023 at 04:40:50PM +0300, Michael Tokarev wrote:
> 06.04.2023 09:40, Yang Zhong wrote:
> > The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
> > FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):{ECX,EDX}, which made
> > SGX enclave only supported SSE and x87 feature(xfrm=0x3).
> > 
> > Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based 
> > features")
> 
> This seems to be -stable material, no?
>
  
  I checked Qemu stable-7.2, the 301e90675c3f patch was included into this 
release.
  So, this fix patch need to be merged into stable release. thanks!

  Regards,
  Yang

> /mjt

Re: [PATCH v3] target/i386: Change wrong XFRM value

2023-04-07 Thread Yang Zhong

On Thu, Apr 06, 2023 at 02:05:06PM +0200, Paolo Bonzini wrote:
> Queued, thanks.
>

  Paolo, thanks!

  Yang

> Paolo
> 
>

[PATCH v3] target/i386: Change wrong XFRM value

2023-04-05 Thread Yang Zhong

The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):{ECX,EDX}, which made
SGX enclave only supported SSE and x87 feature(xfrm=0x3).

Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")

Signed-off-by: Yang Zhong 
Reviewed-by: Yang Weijiang 
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6576287e5b..f083ff4335 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5718,8 +5718,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 } else {
 *eax &= env->features[FEAT_SGX_12_1_EAX];
 *ebx &= 0; /* ebx reserve */
-*ecx &= env->features[FEAT_XSAVE_XSS_LO];
-*edx &= env->features[FEAT_XSAVE_XSS_HI];
+*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
+*edx &= env->features[FEAT_XSAVE_XCR0_HI];
 
 /* FP and SSE are always allowed regardless of XSAVE/XCR0. */
 *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;

Re: [RESEND PATCH v2] target/i386: Switch back XFRM value

2023-04-05 Thread Yang Zhong

On Mon, Mar 27, 2023 at 04:03:54PM +0800, Yang, Weijiang wrote:
> 
> On 3/27/2023 3:33 PM, Christian Ehrhardt wrote:
> > On Thu, Oct 27, 2022 at 2:36 AM Yang, Weijiang  
> > wrote:
> > > 
> > > On 10/26/2022 7:57 PM, Zhong, Yang wrote:
> > > > The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
> > > > FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):{ECX,EDX}, which made
> > > > SGX enclave only supported SSE and x87 feature(xfrm=0x3).
> > > > 
> > > > Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based 
> > > > features")
> > > > 
> > > > Signed-off-by: Yang Zhong 
> > > > ---
> > > >target/i386/cpu.c | 4 ++--
> > > >1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > > index ad623d91e4..19aaed877b 100644
> > > > --- a/target/i386/cpu.c
> > > > +++ b/target/i386/cpu.c
> > > > @@ -5584,8 +5584,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t 
> > > > index, uint32_t count,
> > > >} else {
> > > >*eax &= env->features[FEAT_SGX_12_1_EAX];
> > > >*ebx &= 0; /* ebx reserve */
> > > > -*ecx &= env->features[FEAT_XSAVE_XSS_LO];
> > > > -*edx &= env->features[FEAT_XSAVE_XSS_HI];
> > > > +*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
> > > > +*edx &= env->features[FEAT_XSAVE_XCR0_HI];
> > > Oops, that's my fault to replace with wrong definitions, thanks for the 
> > > fix!
> > > 
> > > Reviewed-by:  Yang Weijiang 
> > Hi,
> > I do not have any background on this but stumbled over this and wondered,
> > is there any particular reason why this wasn't applied yet?
> > 
> > It seemed to fix a former mistake, was acked and then ... silence
> 
> Chris, thanks for the catch!
> 
> I double checked this patch isn't in the latest 8.0.0-rc1 tree.
> 
> 
> Hi, Paolo,
> 
> Could you help merge this fixup patch? Thanks!


  Hello all,

  Let me rebase this patch and resend it, thanks!

  Yang


> 
> > 
> > > >/* FP and SSE are always allowed regardless of 
> > > > XSAVE/XCR0. */
> > > >*ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;
> >

Re: About the instance_finalize callback in VFIO PCI

2023-03-22 Thread Yang Zhong

On Wed, Mar 22, 2023 at 12:22:27PM -0600, Alex Williamson wrote:
> On Wed, 22 Mar 2023 09:10:20 -0400
> Yang Zhong  wrote:
> 
> > On Wed, Mar 22, 2023 at 01:56:13PM +0100, Cédric Le Goater wrote:
> > > On 3/22/23 13:28, Yang Zhong wrote:  
> > > > On Tue, Mar 21, 2023 at 06:30:14PM +0100, Cédric Le Goater wrote:  
> > > > > On 3/20/23 10:31, Yang Zhong wrote:  
> > > > > > Hello Alex and Paolo,
> > > > > > 
> > > > > > There is one instance_finalize callback definition in 
> > > > > > hw/vfio/pci.c, but
> > > > > > i find this callback(vfio_instance_finalize()) never be called 
> > > > > > during the
> > > > > > VM shutdown with close VM or "init 0" command in guest.
> > > > > > 
> > > > > > The Qemu related command:
> > > > > >  ..
> > > > > >  -device vfio-pci,host=d9:00.0
> > > > > >  ..  
> > > > > 
> > > > > well, the finalize op is correctly called for hot unplugged devices, 
> > > > > using
> > > > > device_add.
> > > > >   
> > > > Thanks Cédric, i can use device_del command in the monitor to
> > > > trigger this instance_finalize callback function in the VFIO PCI.
> > > > thanks!  
> > > 
> > > yes. I think that in the shutdown case, QEMU simply relies on exit() to
> > > do the cleanup. On the kernel side, unmaps, fds being closed trigger any
> > > allocated resources.
> > > 
> > > Out of curiosity, what were you trying to achieve in the finalize op ?
> > >   
> >  
> >  We are doing one new feature, which need this callback to do some
> >  cleanup work with VFIO/iommufd kernel module. thanks!
> 
> This sounds dangerously like relying on userspace for cleanup.  Kernel
> drivers need to be able to perform all cleanup themselves when file
> descriptors are closed.  They must expect that userspace can be killed
> at any point in time w/o an opportunity to do cleanup work.  Thanks,
> 

  Thanks Alex, yes, we have moved these cleanup to kernel driver side.
  I was just curious about what scenario this instance_finalize callback 
  is used in VFIO PCI, now it is clear, thanks a lot!

  Regards,
  Yang


> Alex
>

Re: About the instance_finalize callback in VFIO PCI

2023-03-22 Thread Yang Zhong

On Wed, Mar 22, 2023 at 01:56:13PM +0100, Cédric Le Goater wrote:
> On 3/22/23 13:28, Yang Zhong wrote:
> > On Tue, Mar 21, 2023 at 06:30:14PM +0100, Cédric Le Goater wrote:
> > > On 3/20/23 10:31, Yang Zhong wrote:
> > > > Hello Alex and Paolo,
> > > > 
> > > > There is one instance_finalize callback definition in hw/vfio/pci.c, but
> > > > i find this callback(vfio_instance_finalize()) never be called during 
> > > > the
> > > > VM shutdown with close VM or "init 0" command in guest.
> > > > 
> > > > The Qemu related command:
> > > >  ..
> > > >  -device vfio-pci,host=d9:00.0
> > > >  ..
> > > 
> > > well, the finalize op is correctly called for hot unplugged devices, using
> > > device_add.
> > > 
> > Thanks Cédric, i can use device_del command in the monitor to
> > trigger this instance_finalize callback function in the VFIO PCI.
> > thanks!
> 
> yes. I think that in the shutdown case, QEMU simply relies on exit() to
> do the cleanup. On the kernel side, unmaps, fds being closed trigger any
> allocated resources.
> 
> Out of curiosity, what were you trying to achieve in the finalize op ?
> 
 
 We are doing one new feature, which need this callback to do some
 cleanup work with VFIO/iommufd kernel module. thanks!

 Yang


> Thanks,
> 
> C.

Re: About the instance_finalize callback in VFIO PCI

2023-03-22 Thread Yang Zhong

On Tue, Mar 21, 2023 at 09:44:18PM +0100, Paolo Bonzini wrote:
> Il mar 21 mar 2023, 18:30 Cédric Le Goater  ha scritto:
> 
> > I would have thought that user_creatable_cleanup would have taken care
> > of it. But it's not. This needs some digging.
> >
> 
> user_creatable_cleanup is only for -object, not for -device.
>

  Paolo, thanks for helping to clarify this issue.

  Maybe i am clear now, the vfio_instance_finalize() in the
  hw/vfio/pci.c is only for unhotplug vfio pci device from monitor to
  cleanup resource. For static "-device vfio-pci ." command, the
  cleanup resource is responsibility of kernel exit system, not the qemu
  vfio. Once we close Qemu process, the kernel will call do_exit() to
  release these resource, so the vfio module in kernel will handle
  these cleanup work. thanks!

  Yang

> Paolo
> 
> 
> > C.
> >
> >
> > > By the way, i also debugged other instance_finalize callback functions,
> > > if my understanding is right, all instance_finalize callback should be
> > > called from object_unref(object) from qemu_cleanup(void) in
> > > ./softmmu/runstate.c. But there is no VFIO related object_unref() call in
> > > this cleanup function, So the instance_finalize callback in vfio pci
> > > should be useless? thanks!
> > >
> > > Regards,
> > > Yang
> > >
> > >
> >
> >

Re: About the instance_finalize callback in VFIO PCI

2023-03-22 Thread Yang Zhong

On Tue, Mar 21, 2023 at 06:30:14PM +0100, Cédric Le Goater wrote:
> On 3/20/23 10:31, Yang Zhong wrote:
> > Hello Alex and Paolo,
> > 
> > There is one instance_finalize callback definition in hw/vfio/pci.c, but
> > i find this callback(vfio_instance_finalize()) never be called during the
> > VM shutdown with close VM or "init 0" command in guest.
> > 
> > The Qemu related command:
> > ..
> > -device vfio-pci,host=d9:00.0
> > ..
> 
> well, the finalize op is correctly called for hot unplugged devices, using
> device_add.
> 
   Thanks Cédric, i can use device_del command in the monitor to
   trigger this instance_finalize callback function in the VFIO PCI.
   thanks!

   Yang
>

About the instance_finalize callback in VFIO PCI

2023-03-20 Thread Yang Zhong

Hello Alex and Paolo,

There is one instance_finalize callback definition in hw/vfio/pci.c, but
i find this callback(vfio_instance_finalize()) never be called during the
VM shutdown with close VM or "init 0" command in guest.

The Qemu related command:
   ..
   -device vfio-pci,host=d9:00.0
   ..

static const TypeInfo vfio_pci_dev_info = {
.name = TYPE_VFIO_PCI,
.parent = TYPE_PCI_DEVICE,
.instance_size = sizeof(VFIOPCIDevice),
.class_init = vfio_pci_dev_class_init,
.instance_init = vfio_instance_init,
.instance_finalize = vfio_instance_finalize,
.interfaces = (InterfaceInfo[]) {
{ INTERFACE_PCIE_DEVICE },
{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
{ }
},
};

If my test method is wrong, would you please tell me how to trigger to
this callback when VM shutdown? thanks.

By the way, i also debugged other instance_finalize callback functions,
if my understanding is right, all instance_finalize callback should be
called from object_unref(object) from qemu_cleanup(void) in
./softmmu/runstate.c. But there is no VFIO related object_unref() call in
this cleanup function, So the instance_finalize callback in vfio pci
should be useless? thanks!

Regards,
Yang

Re: [PATCH] i386: SGX: remove deprecated member of SGXInfo

2022-12-18 Thread Yang Zhong

On Sun, Dec 18, 2022 at 01:06:49AM +0100, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini 
> ---
>  docs/about/deprecated.rst   | 13 -
>  docs/about/removed-features.rst | 13 +
>  hw/i386/sgx.c   | 15 ++-
>  qapi/misc-target.json   | 12 ++--
>  4 files changed, 21 insertions(+), 32 deletions(-)
> 

   Tested-by: Yang Zhong 

   By the way, there is another sgx bug, please help review, thanks!
   https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg04825.html

   Yang
> 
>

Re: [PATCH] target/i386: Fix wrong XSAVE feature names

2022-12-09 Thread Yang Zhong

In fact, one month ago, I have sent out V2 for this issue. thanks!
https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg04825.html

Yang


On Wed, Dec 07, 2022 at 09:47:47PM -0500, Xiaocheng Dong wrote:
> The previous patch changes the name from FEAT_XSAVE_COMP_{LO|HI}
> to FEAT_XSAVE_XCR0_{LO|HI}, the changes for CPUID.0x12.0x1 should be
> FEAT_XSAVE_XCR0_{LO|HI}, the SGX can't work in VM if these are not right
> 
> Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")
> 
> Signed-off-by: Xiaocheng Dong 
> ---
>  target/i386/cpu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 22b681ca37..0f71ff9fea 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -5584,8 +5584,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  } else {
>  *eax &= env->features[FEAT_SGX_12_1_EAX];
>  *ebx &= 0; /* ebx reserve */
> -*ecx &= env->features[FEAT_XSAVE_XSS_LO];
> -*edx &= env->features[FEAT_XSAVE_XSS_HI];
> +*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
> +*edx &= env->features[FEAT_XSAVE_XCR0_HI];
>  
>  /* FP and SSE are always allowed regardless of XSAVE/XCR0. */
>  *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;
> -- 
> 2.31.1
> 
>

[RESEND PATCH v2] target/i386: Switch back XFRM value

2022-10-26 Thread Yang Zhong

The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):{ECX,EDX}, which made
SGX enclave only supported SSE and x87 feature(xfrm=0x3).

Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")

Signed-off-by: Yang Zhong 
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ad623d91e4..19aaed877b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5584,8 +5584,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 } else {
 *eax &= env->features[FEAT_SGX_12_1_EAX];
 *ebx &= 0; /* ebx reserve */
-*ecx &= env->features[FEAT_XSAVE_XSS_LO];
-*edx &= env->features[FEAT_XSAVE_XSS_HI];
+*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
+*edx &= env->features[FEAT_XSAVE_XCR0_HI];
 
 /* FP and SSE are always allowed regardless of XSAVE/XCR0. */
 *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;
-- 
2.30.2

[PATCH v2] target/i386: Switch back XFRM value

2022-10-12 Thread Yang Zhong

The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):{ECX,EDX}, which made
SGX enclave only supported SSE and x87 feature(xfrm=0x3).

Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")

Signed-off-by: Yang Zhong 
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ad623d91e4..19aaed877b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5584,8 +5584,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 } else {
 *eax &= env->features[FEAT_SGX_12_1_EAX];
 *ebx &= 0; /* ebx reserve */
-*ecx &= env->features[FEAT_XSAVE_XSS_LO];
-*edx &= env->features[FEAT_XSAVE_XSS_HI];
+*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
+*edx &= env->features[FEAT_XSAVE_XCR0_HI];
 
 /* FP and SSE are always allowed regardless of XSAVE/XCR0. */
 *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;
-- 
2.30.2

Re: [PATCH] target/i386: Switch back XFRM value

2022-10-12 Thread Yang Zhong

On Wed, Oct 12, 2022 at 09:59:04AM +, Huang, Kai wrote:
> On Wed, 2022-10-12 at 04:26 -0400, Yang Zhong wrote:
> > The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
> > FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):ECX, which made SGX
>   ^
> 
> Nit: both ECX and EDX are wrongly set, but not only ECX.
> 

  Yes, I will change to CPUID(EAX=12,ECX=1):{ECX,EDX}, thanks! 

> > enclave only supported SSE and x87 feature(xfrm=0x3).
> 
> Is this true?  Perhaps I am missing something, but it seems env-
> >features[FEAT_XSAVE_XCR0_LO] only includes LBR bit, which is bit 15.

  We printed the XFRM value from SGX SDK to find this issue.

> 
> /* Calculate XSAVE components based on the configured CPU feature flags */
> static void x86_cpu_enable_xsave_components(X86CPU *cpu)
> {
> ...
> env->features[FEAT_XSAVE_XSS_LO] = mask & CPUID_XSTATE_XSS_MASK;
> ...
> }
> 
> /* CPUID feature bits available in XSS */
> #define CPUID_XSTATE_XSS_MASK(XSTATE_ARCH_LBR_MASK)
> 
> > 
> > Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based 
> > features")
> > 
> > Signed-off-by: Yang Zhong 
> > ---
> >  target/i386/cpu.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index ad623d91e4..19aaed877b 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -5584,8 +5584,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> > uint32_t count,
> >  } else {
> >  *eax &= env->features[FEAT_SGX_12_1_EAX];
> >  *ebx &= 0; /* ebx reserve */
> > -*ecx &= env->features[FEAT_XSAVE_XSS_LO];
> > -*edx &= env->features[FEAT_XSAVE_XSS_HI];
> > +*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
> > +*edx &= env->features[FEAT_XSAVE_XCR0_HI];
> >  
> >  /* FP and SSE are always allowed regardless of XSAVE/XCR0. */
> >  *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;
> 
> The code looks good:
> 
> Reviewed-by: Kai Huang 
>

[PATCH] target/i386: Switch back XFRM value

2022-10-12 Thread Yang Zhong

The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):ECX, which made SGX
enclave only supported SSE and x87 feature(xfrm=0x3).

Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")

Signed-off-by: Yang Zhong 
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ad623d91e4..19aaed877b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5584,8 +5584,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 } else {
 *eax &= env->features[FEAT_SGX_12_1_EAX];
 *ebx &= 0; /* ebx reserve */
-*ecx &= env->features[FEAT_XSAVE_XSS_LO];
-*edx &= env->features[FEAT_XSAVE_XSS_HI];
+*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
+*edx &= env->features[FEAT_XSAVE_XCR0_HI];
 
 /* FP and SSE are always allowed regardless of XSAVE/XCR0. */
 *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;
-- 
2.30.2

Re: [PATCH] i386: Add new CPU model SapphireRapids

2022-09-28 Thread Yang Zhong

On Mon, Sep 26, 2022 at 09:51:13AM +0100, Dr. David Alan Gilbert wrote:
> * Yang Zhong (yang.zh...@linux.intel.com) wrote:
> > On Sat, Sep 24, 2022 at 12:01:16AM +0800, Xiaoyao Li wrote:
> > > On 9/23/2022 9:30 PM, Yang Zhong wrote:
> > > > On Wed, Sep 21, 2022 at 03:51:42PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Wang, Lei (lei4.w...@intel.com) wrote:
> > > > > > The new CPU model mostly inherits features from Icelake-Server, 
> > > > > > while
> > > > > > adding new features:
> > > > > >   - AMX (Advance Matrix eXtensions)
> > > > > >   - Bus Lock Debug Exception
> > > > > > and new instructions:
> > > > > >   - AVX VNNI (Vector Neural Network Instruction):
> > > > > >  - VPDPBUS: Multiply and Add Unsigned and Signed Bytes
> > > > > >  - VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with 
> > > > > > Saturation
> > > > > >  - VPDPWSSD: Multiply and Add Signed Word Integers
> > > > > >  - VPDPWSSDS: Multiply and Add Signed Integers with Saturation
> > > > > >   - FP16: Replicates existing AVX512 computational SP (FP32) 
> > > > > > instructions
> > > > > > using FP16 instead of FP32 for ~2X performance gain
> > > > > >   - SERIALIZE: Provide software with a simple way to force the 
> > > > > > processor to
> > > > > > complete all modifications, faster, allowed in all privilege 
> > > > > > levels and
> > > > > > not causing an unconditional VM exit
> > > > > >   - TSX Suspend Load Address Tracking: Allows programmers to choose 
> > > > > > which
> > > > > > memory accesses do not need to be tracked in the TSX read set
> > > > > >   - AVX512_BF16: Vector Neural Network Instructions supporting 
> > > > > > BFLOAT16
> > > > > > inputs and conversion instructions from IEEE single precision
> > > > > > 
> > > > > > Features may be added in future versions:
> > > > > >   - CET (virtualization support hasn't been merged)
> > > > > > Instructions may be added in future versions:
> > > > > >   - fast zero-length MOVSB (KVM doesn't support yet)
> > > > > >   - fast short STOSB (KVM doesn't support yet)
> > > > > >   - fast short CMPSB, SCASB (KVM doesn't support yet)
> > > > > > 
> > > > > > Signed-off-by: Wang, Lei 
> > > > > > Reviewed-by: Robert Hoo 
> > > > > 
> > > > > Hi,
> > > > > What fills in the AMX tile and tmul information leafs
> > > > > (0x1D, 0x1E)?
> > > > >In particular, how would we make sure when we migrate between two
> > > > > generations of AMX/Tile/Tmul capable devices with different
> > > > > register/palette/tmul limits that the migration is tied to the CPU 
> > > > > type
> > > > > correctly?
> > > > >Would you expect all devices called a 'SappireRapids' to have the 
> > > > > same
> > > > > sizes?
> > > > > 
> > > > 
> > > > There is only one palette in current design. This palette include 8
> > > > tiles.  Those two CPUID leafs defined bytes_per_tile, 
> > > > total_tile_bytes,
> > > > max_rows and etc, the AMX tool will configure those values into 
> > > > TILECFG with
> > > > ldtilecfg instrcutions. Once tiles are configured, we can use
> > > > tileload instruction to load data into those tiles.
> > > > 
> > > > We did migration between two SappireRapids with amx self test tool
> > > > (tools/testing/selftests/x86/amx.c)started in two sides, the 
> > > > migration
> > > > work well.
> > > > 
> > > > As for SappireRapids and more newer cpu types, those two CPUID leafs
> > > > definitions are all same on AMX.
> > > 
> > > I'm not sure what definitions mean here. Are you saying the CPUID values 
> > > of
> > > leaf 0x1D and 0x1E won't change for any future Intel Silicion?
> > > 
> > > Personally, I doubt it. And we shouldn't take such assumption unless Intel
> > > states it SDM.
> > 
> >   The current 0x1D

Re: [PATCH] i386: Add new CPU model SapphireRapids

2022-09-26 Thread Yang Zhong

On Sat, Sep 24, 2022 at 12:01:16AM +0800, Xiaoyao Li wrote:
> On 9/23/2022 9:30 PM, Yang Zhong wrote:
> > On Wed, Sep 21, 2022 at 03:51:42PM +0100, Dr. David Alan Gilbert wrote:
> > > * Wang, Lei (lei4.w...@intel.com) wrote:
> > > > The new CPU model mostly inherits features from Icelake-Server, while
> > > > adding new features:
> > > >   - AMX (Advance Matrix eXtensions)
> > > >   - Bus Lock Debug Exception
> > > > and new instructions:
> > > >   - AVX VNNI (Vector Neural Network Instruction):
> > > >  - VPDPBUS: Multiply and Add Unsigned and Signed Bytes
> > > >  - VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with 
> > > > Saturation
> > > >  - VPDPWSSD: Multiply and Add Signed Word Integers
> > > >  - VPDPWSSDS: Multiply and Add Signed Integers with Saturation
> > > >   - FP16: Replicates existing AVX512 computational SP (FP32) 
> > > > instructions
> > > > using FP16 instead of FP32 for ~2X performance gain
> > > >   - SERIALIZE: Provide software with a simple way to force the 
> > > > processor to
> > > > complete all modifications, faster, allowed in all privilege levels 
> > > > and
> > > > not causing an unconditional VM exit
> > > >   - TSX Suspend Load Address Tracking: Allows programmers to choose 
> > > > which
> > > > memory accesses do not need to be tracked in the TSX read set
> > > >   - AVX512_BF16: Vector Neural Network Instructions supporting BFLOAT16
> > > > inputs and conversion instructions from IEEE single precision
> > > > 
> > > > Features may be added in future versions:
> > > >   - CET (virtualization support hasn't been merged)
> > > > Instructions may be added in future versions:
> > > >   - fast zero-length MOVSB (KVM doesn't support yet)
> > > >   - fast short STOSB (KVM doesn't support yet)
> > > >   - fast short CMPSB, SCASB (KVM doesn't support yet)
> > > > 
> > > > Signed-off-by: Wang, Lei 
> > > > Reviewed-by: Robert Hoo 
> > > 
> > > Hi,
> > > What fills in the AMX tile and tmul information leafs
> > > (0x1D, 0x1E)?
> > >In particular, how would we make sure when we migrate between two
> > > generations of AMX/Tile/Tmul capable devices with different
> > > register/palette/tmul limits that the migration is tied to the CPU type
> > > correctly?
> > >Would you expect all devices called a 'SappireRapids' to have the same
> > > sizes?
> > > 
> > 
> > There is only one palette in current design. This palette include 8
> > tiles.  Those two CPUID leafs defined bytes_per_tile, total_tile_bytes,
> > max_rows and etc, the AMX tool will configure those values into TILECFG 
> > with
> > ldtilecfg instrcutions. Once tiles are configured, we can use
> > tileload instruction to load data into those tiles.
> > 
> > We did migration between two SappireRapids with amx self test tool
> > (tools/testing/selftests/x86/amx.c)started in two sides, the migration
> > work well.
> > 
> > As for SappireRapids and more newer cpu types, those two CPUID leafs
> > definitions are all same on AMX.
> 
> I'm not sure what definitions mean here. Are you saying the CPUID values of
> leaf 0x1D and 0x1E won't change for any future Intel Silicion?
> 
> Personally, I doubt it. And we shouldn't take such assumption unless Intel
> states it SDM.

  The current 0x1D and 0x1E definitions as below:

  /* CPUID Leaf 0x1D constants: */
  #define INTEL_AMX_TILE_MAX_SUBLEAF 0x1
  #define INTEL_AMX_TOTAL_TILE_BYTES 0x2000
  #define INTEL_AMX_BYTES_PER_TILE   0x400
  #define INTEL_AMX_BYTES_PER_ROW0x40
  #define INTEL_AMX_TILE_MAX_NAMES   0x8
  #define INTEL_AMX_TILE_MAX_ROWS0x10

  /* CPUID Leaf 0x1E constants: */
  #define INTEL_AMX_TMUL_MAX_K   0x10
  #define INTEL_AMX_TMUL_MAX_N   0x40

  These values are defined from SDM, and from the new developping CPU,
  these values are still same with SappireRapids. thanks!

  Yang
> 
> > So, on AMX perspective, the migration
> > should be workable on subsequent cpu types. thanks!
> 
> I think what Dave worried is that when migrating one VM created with
> "SapphireRapids" model on SPR machine to some newer platform in the future,
> where the newer platform reports different value on CPUID leaves 0x1D and
> 0x1E than SPR platform.
> 
> I think we need to contain CPUID leaves 0x1D and 0x1E into CPU model as
> well. Otherwise we will hit the same as Intel PT that SPR reports less
> capabilities that ICX.
>

Re: [PATCH] i386: Add new CPU model SapphireRapids

2022-09-23 Thread Yang Zhong

On Wed, Sep 21, 2022 at 03:51:42PM +0100, Dr. David Alan Gilbert wrote:
> * Wang, Lei (lei4.w...@intel.com) wrote:
> > The new CPU model mostly inherits features from Icelake-Server, while
> > adding new features:
> >  - AMX (Advance Matrix eXtensions)
> >  - Bus Lock Debug Exception
> > and new instructions:
> >  - AVX VNNI (Vector Neural Network Instruction):
> > - VPDPBUS: Multiply and Add Unsigned and Signed Bytes
> > - VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
> > - VPDPWSSD: Multiply and Add Signed Word Integers
> > - VPDPWSSDS: Multiply and Add Signed Integers with Saturation
> >  - FP16: Replicates existing AVX512 computational SP (FP32) instructions
> >using FP16 instead of FP32 for ~2X performance gain
> >  - SERIALIZE: Provide software with a simple way to force the processor to
> >complete all modifications, faster, allowed in all privilege levels and
> >not causing an unconditional VM exit
> >  - TSX Suspend Load Address Tracking: Allows programmers to choose which
> >memory accesses do not need to be tracked in the TSX read set
> >  - AVX512_BF16: Vector Neural Network Instructions supporting BFLOAT16
> >inputs and conversion instructions from IEEE single precision
> > 
> > Features may be added in future versions:
> >  - CET (virtualization support hasn't been merged)
> > Instructions may be added in future versions:
> >  - fast zero-length MOVSB (KVM doesn't support yet)
> >  - fast short STOSB (KVM doesn't support yet)
> >  - fast short CMPSB, SCASB (KVM doesn't support yet)
> > 
> > Signed-off-by: Wang, Lei 
> > Reviewed-by: Robert Hoo 
> 
> Hi,
>What fills in the AMX tile and tmul information leafs
> (0x1D, 0x1E)?
   
>   In particular, how would we make sure when we migrate between two
> generations of AMX/Tile/Tmul capable devices with different
> register/palette/tmul limits that the migration is tied to the CPU type
> correctly?
>   Would you expect all devices called a 'SappireRapids' to have the same
> sizes?
>

   There is only one palette in current design. This palette include 8
   tiles.  Those two CPUID leafs defined bytes_per_tile, total_tile_bytes,
   max_rows and etc, the AMX tool will configure those values into TILECFG with
   ldtilecfg instrcutions. Once tiles are configured, we can use
   tileload instruction to load data into those tiles.

   We did migration between two SappireRapids with amx self test tool
   (tools/testing/selftests/x86/amx.c)started in two sides, the migration
   work well.

   As for SappireRapids and more newer cpu types, those two CPUID leafs
   definitions are all same on AMX. So, on AMX perspective, the migration 
   should be workable on subsequent cpu types. thanks!

   Yang

> Dave
>  
> > ---
> >  target/i386/cpu.c | 128 ++
> >  target/i386/cpu.h |   4 ++
> >  2 files changed, 132 insertions(+)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 1db1278a59..abb43853d4 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -3467,6 +3467,134 @@ static const X86CPUDefinition builtin_x86_defs[] = {
> >  { /* end of list */ }
> >  }
> >  },
> > +{
> > +.name = "SapphireRapids",
> > +.level = 0x20,
> > +.vendor = CPUID_VENDOR_INTEL,
> > +.family = 6,
> > +.model = 143,
> > +.stepping = 4,
> > +/*
> > + * please keep the ascending order so that we can have a clear 
> > view of
> > + * bit position of each feature.
> > + */
> > +.features[FEAT_1_EDX] =
> > +CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
> > +CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
> > +CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
> > +CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX | 
> > CPUID_FXSR |
> > +CPUID_SSE | CPUID_SSE2,
> > +.features[FEAT_1_ECX] =
> > +CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 |
> > +CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | 
> > CPUID_EXT_SSE41 |
> > +CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
> > +CPUID_EXT_POPCNT | CPUID_EXT_TSC_DEADLINE_TIMER | 
> > CPUID_EXT_AES |
> > +CPUID_EXT_XSAVE | CPUID_EXT_AVX | CPUID_EXT_F16C | 
> > CPUID_EXT_RDRAND,
> > +.features[FEAT_8000_0001_EDX] =
> > +CPUID_EXT2_SYSCALL | CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
> > +CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
> > +.features[FEAT_8000_0001_ECX] =
> > +CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH,
> > +.features[FEAT_8000_0008_EBX] =
> > +CPUID_8000_0008_EBX_WBNOINVD,
> > +.features[FEAT_7_0_EBX] =
> > +CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | 
> > CPUID_7_0_EBX_HLE |
> > +CPUID_7_0_EBX_A

[PATCH] target/i386: Fix wrong count setting

2022-05-30 Thread Yang Zhong

The previous patch used wrong count setting with index value, which got wrong
value from CPUID(EAX=12,ECX=0):EAX. So the SGX1 instruction can't be exposed
to VM and the SGX decice can't work in VM.

Fixes: d19d6ffa0710 ("target/i386: introduce helper to access supported CPUID")

Signed-off-by: Yang Zhong 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bb6a5dd498..9fdfec9d8b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5559,7 +5559,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  * supports.  Features can be further restricted by userspace, but not
  * made more permissive.
  */
-x86_cpu_get_supported_cpuid(0x12, index, eax, ebx, ecx, edx);
+x86_cpu_get_supported_cpuid(0x12, count, eax, ebx, ecx, edx);
 
 if (count == 0) {
 *eax &= env->features[FEAT_SGX_12_0_EAX];
-- 
2.30.2

Re: RFC: sgx-epc is not listed in machine type help

2022-04-28 Thread Yang Zhong

On Thu, Apr 28, 2022 at 02:56:50PM +0200, Jinpu Wang wrote:
> On Thu, Apr 28, 2022 at 2:32 PM Yang Zhong  wrote:
> >
> > On Thu, Apr 28, 2022 at 02:18:54PM +0200, Jinpu Wang wrote:
> > > On Thu, Apr 28, 2022 at 2:05 PM Yang Zhong  wrote:
> > > >
> > > > On Thu, Apr 28, 2022 at 01:59:33PM +0200, Jinpu Wang wrote:
> > > > > Hi Yang, hi Paolo,
> > > > >
> > > > > We noticed sgx-epc machine type is not listed in the output of
> > > > > "qemu-system-x86_64 -M ?",
> > > snip
> > > > >
> > > > >
> > > > > I think this would cause confusion to users, is there a reason behind 
> > > > > this?
> > > > >
> > > >
> > > >   No specific machine type for SGX, and SGX is only supported in Qemu 
> > > > PC and Q35 platform.
> > > Hi Yang,
> > >
> > > Thanks for your quick reply. Sorry for the stupid question.
> > > The information I've got from intel or the help sample from
> > > https://www.qemu.org/docs/master/system/i386/sgx.html, We need to
> > > specify commands something like this to run SGX-EPC guest:
> > > qemu-system-x86-64 -m 2G -nographic -enable-kvm -cpu
> > > host,+sgx-provisionkey  -object
> > > memory-backend-epc,id=mem1,size=512M,prealloc=on -M
> > > sgx-epc.0.memdev=mem1,sgx-epc.0.node=0 /tmp/volume-name.img
> > >
> > > Do you mean internally QEMU is converting -M sgx-epc to PC or Q35, can
> > > I choose which one to use?
> > >
> >
> >   Qemu will replace object with compound key, in that time, Paolo asked me
> >   to use "-M sgx-epc..." to replace "-object sgx-epc..." from Qemu command 
> > line.
> >
> >   So the "-M sgx-epc..." will get sgx-epc's parameters from hash key, and
> >   do not covert sgx-epc to PC or Q35.
> >
> >   SGX is only one Intel cpu feature, and no dedicated SGX Qemu machine type 
> > for SGX.
> >
> >   Another compound key example:
> >   "-M pc,smp.cpus=4,smp.cores=1,smp.threads=1"
> >
> >   Yang
> ah, ok. thx for the sharing.
> so if I specify "-M pc -M sgx-epc.." it will be the explicit way to
> choose PC machine type with sgx feature.
> and "-M q35 -M sgx-epc.." qemu will use Q35 machine type?

  The below command is okay,
  "-M pc,sgx-epc.." or "-M q35,sgx-epc.."

  Yang

> >
> >
> > > Thanks!
> > > Jinpu

Re: RFC: sgx-epc is not listed in machine type help

2022-04-28 Thread Yang Zhong

On Thu, Apr 28, 2022 at 02:18:54PM +0200, Jinpu Wang wrote:
> On Thu, Apr 28, 2022 at 2:05 PM Yang Zhong  wrote:
> >
> > On Thu, Apr 28, 2022 at 01:59:33PM +0200, Jinpu Wang wrote:
> > > Hi Yang, hi Paolo,
> > >
> > > We noticed sgx-epc machine type is not listed in the output of
> > > "qemu-system-x86_64 -M ?",
> snip
> > >
> > >
> > > I think this would cause confusion to users, is there a reason behind 
> > > this?
> > >
> >
> >   No specific machine type for SGX, and SGX is only supported in Qemu PC 
> > and Q35 platform.
> Hi Yang,
> 
> Thanks for your quick reply. Sorry for the stupid question.
> The information I've got from intel or the help sample from
> https://www.qemu.org/docs/master/system/i386/sgx.html, We need to
> specify commands something like this to run SGX-EPC guest:
> qemu-system-x86-64 -m 2G -nographic -enable-kvm -cpu
> host,+sgx-provisionkey  -object
> memory-backend-epc,id=mem1,size=512M,prealloc=on -M
> sgx-epc.0.memdev=mem1,sgx-epc.0.node=0 /tmp/volume-name.img
> 
> Do you mean internally QEMU is converting -M sgx-epc to PC or Q35, can
> I choose which one to use?
>

  Qemu will replace object with compound key, in that time, Paolo asked me
  to use "-M sgx-epc..." to replace "-object sgx-epc..." from Qemu command line.
  
  So the "-M sgx-epc..." will get sgx-epc's parameters from hash key, and
  do not covert sgx-epc to PC or Q35.

  SGX is only one Intel cpu feature, and no dedicated SGX Qemu machine type for 
SGX. 
  
  Another compound key example:
  "-M pc,smp.cpus=4,smp.cores=1,smp.threads=1"
 
  Yang

 
> Thanks!
> Jinpu

Re: RFC: sgx-epc is not listed in machine type help

2022-04-28 Thread Yang Zhong

On Thu, Apr 28, 2022 at 01:59:33PM +0200, Jinpu Wang wrote:
> Hi Yang, hi Paolo,
> 
> We noticed sgx-epc machine type is not listed in the output of
> "qemu-system-x86_64 -M ?",
> This is what I got with qemu-7.0
> Supported machines are:
> microvm  microvm (i386)
> pc   Standard PC (i440FX + PIIX, 1996) (alias of 
> pc-i440fx-7.0)
> pc-i440fx-7.0Standard PC (i440FX + PIIX, 1996) (default)
> pc-i440fx-6.2Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-6.1Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-6.0Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-5.2Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-5.1Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-5.0Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-4.2Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-4.1Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-4.0Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-3.1Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-3.0Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.9Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.8Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.7Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.6Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.5Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.4Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.3Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.2Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.12   Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.11   Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.10   Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.1Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-2.0Standard PC (i440FX + PIIX, 1996)
> pc-i440fx-1.7Standard PC (i440FX + PIIX, 1996) (deprecated)
> pc-i440fx-1.6Standard PC (i440FX + PIIX, 1996) (deprecated)
> pc-i440fx-1.5Standard PC (i440FX + PIIX, 1996) (deprecated)
> pc-i440fx-1.4Standard PC (i440FX + PIIX, 1996) (deprecated)
> q35  Standard PC (Q35 + ICH9, 2009) (alias of pc-q35-7.0)
> pc-q35-7.0   Standard PC (Q35 + ICH9, 2009)
> pc-q35-6.2   Standard PC (Q35 + ICH9, 2009)
> pc-q35-6.1   Standard PC (Q35 + ICH9, 2009)
> pc-q35-6.0   Standard PC (Q35 + ICH9, 2009)
> pc-q35-5.2   Standard PC (Q35 + ICH9, 2009)
> pc-q35-5.1   Standard PC (Q35 + ICH9, 2009)
> pc-q35-5.0   Standard PC (Q35 + ICH9, 2009)
> pc-q35-4.2   Standard PC (Q35 + ICH9, 2009)
> pc-q35-4.1   Standard PC (Q35 + ICH9, 2009)
> pc-q35-4.0.1 Standard PC (Q35 + ICH9, 2009)
> pc-q35-4.0   Standard PC (Q35 + ICH9, 2009)
> pc-q35-3.1   Standard PC (Q35 + ICH9, 2009)
> pc-q35-3.0   Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.9   Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.8   Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.7   Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.6   Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.5   Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.4   Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.12  Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.11  Standard PC (Q35 + ICH9, 2009)
> pc-q35-2.10  Standard PC (Q35 + ICH9, 2009)
> isapcISA-only PC
> none empty machine
> x-remote Experimental remote machine
> 
> 
> I think this would cause confusion to users, is there a reason behind this?
> 

  No specific machine type for SGX, and SGX is only supported in Qemu PC and 
Q35 platform.
  
  Yang


> Thanks!
> Jinpu Wang @ IONOS Cloud

Re: [PATCH] target/i386: Return right size value after dynamic xfeature enabled

2022-03-24 Thread Yang Zhong

On Thu, Mar 24, 2022 at 08:35:10AM +0100, Paolo Bonzini wrote:
> On 3/24/22 04:18, Yang Zhong wrote:
> >The kvm_arch_get_supported_cpuid() only call KVM_GET_SUPPORTED_CPUID one
> >time, so the cpuid buffer information still keep older value. Once Qemu
> >enable new dynamic xfeature, like XTILEDATA, the cpuid[0D,0].{EBX,ECX}
> >still return older value.
> >
> >This patch can return right size value in kvm_init_xsave() if XTILEDATA
> >has been enabled by arch_prctl.
> >
> >assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) <=
> >env->xsave_buf_len);
> >
> >Signed-off-by: Yang Zhong 
> 
> I don't understand, is this a bugfix for an assertion failure or
> just a cleanup?
> 

  In fact, no assert issue here.
  The issue is after we enable dynamic xfeature, and if we still use
  kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) to get size,
  the size is older value(size:2816), not the size(11008) we expected.

  The code for cpuid[0D,0].{EBX,ECX} by kvm_arch_get_supported_cpuid()
  need cleanup here, or we can't get the real value here. thanks!

  Yang



> Either way, while I like the idea of modifying
> kvm_arch_get_supported_cpuid, I think the right thing to do is to
> just use has_xsave2 as the return value if it is nonzero.  And then
> kvm_init_xsave can just do
> 
> if (!has_xsave) {
> return;
> }
> env->xsave_buf_len = kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX);
> 
> without the assertion that is now obvious.
> 
> Paolo
> 
> >---
> >  target/i386/cpu.h |  3 +++
> >  target/i386/kvm/kvm.c | 15 +--
> >  2 files changed, 16 insertions(+), 2 deletions(-)
> >
> >diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >index 5e406088a9..814ba4020b 100644
> >--- a/target/i386/cpu.h
> >+++ b/target/i386/cpu.h
> >@@ -565,6 +565,9 @@ typedef enum X86Seg {
> >  #define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
> >  #define ESA_FEATURE_XFD_MASK(1U << ESA_FEATURE_XFD_BIT)
> >+#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
> >+#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
> >+
> >  /* CPUID feature words */
> >  typedef enum FeatureWord {
> >diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> >index 06901c2a43..312d4fccf8 100644
> >--- a/target/i386/kvm/kvm.c
> >+++ b/target/i386/kvm/kvm.c
> >@@ -46,6 +46,7 @@
> >  #include "hw/i386/intel_iommu.h"
> >  #include "hw/i386/x86-iommu.h"
> >  #include "hw/i386/e820_memory_layout.h"
> >+#include "target/i386/cpu.h"
> >  #include "hw/pci/pci.h"
> >  #include "hw/pci/msi.h"
> >@@ -437,6 +438,18 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
> >uint32_t function,
> >  return ret;
> >  }
> >  ret = (reg == R_EAX) ? bitmask : bitmask >> 32;
> >+} else if (function == 0xd && index == 0 &&
> >+   (reg == R_EBX || reg == R_ECX)) {
> >+/*
> >+ * The value returned by KVM_GET_SUPPORTED_CPUID does not include
> >+ * features that already be enabled with the arch_prctl system call.
> >+ */
> >+int rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, 
> >&bitmask);
> >+if (rc) {
> >+warn_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %d", rc);
> >+} else if (bitmask & XSTATE_XTILE_DATA_MASK) {
> >+ret += sizeof(XSaveXTILEDATA);
> >+}
> >  } else if (function == 0x8001 && reg == R_ECX) {
> >  /*
> >   * It's safe to enable TOPOEXT even if it's not returned by
> >@@ -5214,8 +5227,6 @@ bool kvm_arch_cpu_check_are_resettable(void)
> >  return !sev_es_enabled();
> >  }
> >-#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
> >-
> >  void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
> >  {
> >  KVMState *s = kvm_state;
> >

[PATCH] target/i386: Return right size value after dynamic xfeature enabled

2022-03-23 Thread Yang Zhong

The kvm_arch_get_supported_cpuid() only call KVM_GET_SUPPORTED_CPUID one
time, so the cpuid buffer information still keep older value. Once Qemu
enable new dynamic xfeature, like XTILEDATA, the cpuid[0D,0].{EBX,ECX}
still return older value.

This patch can return right size value in kvm_init_xsave() if XTILEDATA
has been enabled by arch_prctl.

assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) <=
   env->xsave_buf_len);

Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h |  3 +++
 target/i386/kvm/kvm.c | 15 +--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5e406088a9..814ba4020b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -565,6 +565,9 @@ typedef enum X86Seg {
 #define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
 #define ESA_FEATURE_XFD_MASK(1U << ESA_FEATURE_XFD_BIT)
 
+#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
+#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
+
 
 /* CPUID feature words */
 typedef enum FeatureWord {
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 06901c2a43..312d4fccf8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -46,6 +46,7 @@
 #include "hw/i386/intel_iommu.h"
 #include "hw/i386/x86-iommu.h"
 #include "hw/i386/e820_memory_layout.h"
+#include "target/i386/cpu.h"
 
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
@@ -437,6 +438,18 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 return ret;
 }
 ret = (reg == R_EAX) ? bitmask : bitmask >> 32;
+} else if (function == 0xd && index == 0 &&
+   (reg == R_EBX || reg == R_ECX)) {
+/*
+ * The value returned by KVM_GET_SUPPORTED_CPUID does not include
+ * features that already be enabled with the arch_prctl system call.
+ */
+int rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
+if (rc) {
+warn_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %d", rc);
+} else if (bitmask & XSTATE_XTILE_DATA_MASK) {
+ret += sizeof(XSaveXTILEDATA);
+}
 } else if (function == 0x8001 && reg == R_ECX) {
 /*
  * It's safe to enable TOPOEXT even if it's not returned by
@@ -5214,8 +5227,6 @@ bool kvm_arch_cpu_check_are_resettable(void)
 return !sev_es_enabled();
 }
 
-#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
-
 void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
 {
 KVMState *s = kvm_state;

Re: [PATCH] KVM: x86: workaround invalid CPUID[0xD,9] info on some AMD processors

2022-03-23 Thread Yang Zhong

On Wed, Mar 23, 2022 at 12:43:15PM +0100, Paolo Bonzini wrote:
> Some AMD processors expose the PKRU extended save state even if they do not 
> have
> the related PKU feature in CPUID.  Worse, when they do they report a size of
> 64, whereas the expected size of the PKRU extended save state is 8, therefore
> the esa->size == eax assertion does not hold.
> 
> The state is already ignored by KVM_GET_SUPPORTED_CPUID because it
> was not enabled in the host XCR0.  However, QEMU kvm_cpu_xsave_init()
> runs before QEMU invokes arch_prctl() to enable dynamically-enabled
> save states such as XTILEDATA, and KVM_GET_SUPPORTED_CPUID hides save
> states that have yet to be enabled.  Therefore, kvm_cpu_xsave_init()
> needs to consult the host CPUID instead of KVM_GET_SUPPORTED_CPUID,
> and dies with an assertion failure.
> 
> When setting up the ExtSaveArea array to match the host, ignore features that
> KVM does not report as supported.  This will cause QEMU to skip the incorrect
> CPUID leaf instead of tripping the assertion.
> 
> Reported-by: Daniel P. Berrangé 
> Analyzed-by: Yang Zhong 
> Signed-off-by: Paolo Bonzini 
> ---
>  target/i386/cpu.c |  4 ++--
>  target/i386/cpu.h |  2 ++
>  target/i386/kvm/kvm-cpu.c | 19 ---
>  3 files changed, 16 insertions(+), 9 deletions(-)

   Verified this patch on AMD EPYC 7402P, no crash issue now. thanks!

   Yang

Re: [PULL 15/22] x86: Grant AMX permission for guest

2022-03-22 Thread Yang Zhong

On Wed, Mar 16, 2022 at 04:57:39PM +0100, Peter Krempa wrote:
> On Tue, Mar 08, 2022 at 12:34:38 +0100, Paolo Bonzini wrote:
> > From: Yang Zhong 
> > 
> > Kernel allocates 4K xstate buffer by default. For XSAVE features
> > which require large state component (e.g. AMX), Linux kernel
> > dynamically expands the xstate buffer only after the process has
> > acquired the necessary permissions. Those are called dynamically-
> > enabled XSAVE features (or dynamic xfeatures).
> > 
> > There are separate permissions for native tasks and guests.
> > 
> > Qemu should request the guest permissions for dynamic xfeatures
> > which will be exposed to the guest. This only needs to be done
> > once before the first vcpu is created.
> > 
> > KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
> > get host side supported_xcr0 and Qemu can decide if it can request
> > dynamically enabled XSAVE features permission.
> > https://lore.kernel.org/all/20220126152210.3044876-1-pbonz...@redhat.com/
> > 
> > Suggested-by: Paolo Bonzini 
> > Signed-off-by: Yang Zhong 
> > Signed-off-by: Jing Liu 
> > Message-Id: <20220217060434.52460-4-yang.zh...@intel.com>
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  target/i386/cpu.c  |  7 +
> >  target/i386/cpu.h  |  4 +++
> >  target/i386/kvm/kvm-cpu.c  | 12 
> >  target/i386/kvm/kvm.c  | 57 ++
> >  target/i386/kvm/kvm_i386.h |  1 +
> >  5 files changed, 75 insertions(+), 6 deletions(-)
> 
> With this commit qemu crashes for me when invoking the following
> QMP command:
> 
> $ ~pipo/git/qemu.git/build/qemu-system-x86_64 -S -no-user-config -nodefaults 
> -nographic -machine none,accel=kvm -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 90, "minor": 2, "major": 6}, 
> "package": "v7.0.0-rc0-8-g1d60bb4b14"}, "capabilities": ["oob"]}}
> {'execute':'qmp_capabilities'}
> {"return": {}}
> {"execute":"qom-list-properties","arguments":{"typename":"max-x86_64-cpu"},"id":"libvirt-41"}
> qemu-system-x86_64: ../target/i386/kvm/kvm-cpu.c:105: kvm_cpu_xsave_init: 
> Assertion `esa->size == eax' failed.
> Aborted (core dumped)
> 
> Note that the above is on a box with an 'AMD Ryzen 9 3900X'.
> 
> Curiously on a laptop with an Intel chip (Intel(R) Core(TM) i7-10610U)
> it seems to work.

  
  Paolo, I debugged this issue and found this issue is caused by xstate feature 
bit9
  (MPK, which like pku in intel) in the some AMD platforms.

  #AMD Spec, p409
  https://www.amd.com/system/files/TechDocs/24593.pdf

  I checked the cpuid info from AMD EPYC 7402P server and ECX=0x9, the eax is 
0x40,
  which is different with eax=0x0008 in Intel platform. So, the ASSERT is 
generated
  by AMX changes.

  ##AMD host
  0x000d 0x00: eax=0x0207 ebx=0x0340 ecx=0x0380 edx=0x
  0x000d 0x01: eax=0x000f ebx=0x0340 ecx=0x edx=0x
  0x000d 0x02: eax=0x0100 ebx=0x0240 ecx=0x edx=0x
  0x000d 0x09: eax=0x0040 ebx=0x0340 ecx=0x edx=0x

  ##Intel host
  0x000d 0x00: eax=0x000602e7 ebx=0x2b00 ecx=0x2b00 edx=0x
  0x000d 0x01: eax=0x001f ebx=0x2d00 ecx=0xdd00 edx=0x
  0x000d 0x02: eax=0x0100 ebx=0x0240 ecx=0x edx=0x
  0x000d 0x05: eax=0x0040 ebx=0x0440 ecx=0x edx=0x
  0x000d 0x06: eax=0x0200 ebx=0x0480 ecx=0x edx=0x
  0x000d 0x07: eax=0x0400 ebx=0x0680 ecx=0x edx=0x
  0x000d 0x08: eax=0x0080 ebx=0x ecx=0x0001 edx=0x
  0x000d 0x09: eax=0x0008 ebx=0x0a80 ecx=0x edx=0x
  0x000d 0x0a: eax=0x0008 ebx=0x ecx=0x0001 edx=0x
  0x000d 0x0b: eax=0x0010 ebx=0x ecx=0x0001 edx=0x
  0x000d 0x0c: eax=0x0018 ebx=0x ecx=0x0001 edx=0x
  0x000d 0x0e: eax=0x0030 ebx=0x ecx=0x0001 edx=0x
  0x000d 0x0f: eax=0x0328 ebx=0x ecx=0x0001 edx=0x
  0x000d 0x11: eax=0x0040 ebx=0x0ac0 ecx=0x0002 edx=0x
  0x000d 0x12: eax=0x2000 ebx=0x0b00 ecx=0x0006 edx=0x

  But I also checked same cpuid info from AMD MILAN server, the eax=0x0008 
in ECX=0x9.
  So, for this ECX=0x9, the eax values in different AMD server are different.

  How can we handle those different value since we have used host_cpuid() to 
read host's
  registers? thanks!

  Yang

Re: [PULL 15/22] x86: Grant AMX permission for guest

2022-03-18 Thread Yang Zhong

On Fri, Mar 18, 2022 at 11:13:56AM +0100, Michal Prívozník wrote:
> On 3/16/22 16:57, Peter Krempa wrote:
> > On Tue, Mar 08, 2022 at 12:34:38 +0100, Paolo Bonzini wrote:
> >> From: Yang Zhong 
> >>
> >> Kernel allocates 4K xstate buffer by default. For XSAVE features
> >> which require large state component (e.g. AMX), Linux kernel
> >> dynamically expands the xstate buffer only after the process has
> >> acquired the necessary permissions. Those are called dynamically-
> >> enabled XSAVE features (or dynamic xfeatures).
> >>
> >> There are separate permissions for native tasks and guests.
> >>
> >> Qemu should request the guest permissions for dynamic xfeatures
> >> which will be exposed to the guest. This only needs to be done
> >> once before the first vcpu is created.
> >>
> >> KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
> >> get host side supported_xcr0 and Qemu can decide if it can request
> >> dynamically enabled XSAVE features permission.
> >> https://lore.kernel.org/all/20220126152210.3044876-1-pbonz...@redhat.com/
> >>
> >> Suggested-by: Paolo Bonzini 
> >> Signed-off-by: Yang Zhong 
> >> Signed-off-by: Jing Liu 
> >> Message-Id: <20220217060434.52460-4-yang.zh...@intel.com>
> >> Signed-off-by: Paolo Bonzini 
> >> ---
> >>  target/i386/cpu.c  |  7 +
> >>  target/i386/cpu.h  |  4 +++
> >>  target/i386/kvm/kvm-cpu.c  | 12 
> >>  target/i386/kvm/kvm.c  | 57 ++
> >>  target/i386/kvm/kvm_i386.h |  1 +
> >>  5 files changed, 75 insertions(+), 6 deletions(-)
> > 
> > With this commit qemu crashes for me when invoking the following
> > QMP command:
> > 
> > $ ~pipo/git/qemu.git/build/qemu-system-x86_64 -S -no-user-config 
> > -nodefaults -nographic -machine none,accel=kvm -qmp stdio
> > {"QMP": {"version": {"qemu": {"micro": 90, "minor": 2, "major": 6}, 
> > "package": "v7.0.0-rc0-8-g1d60bb4b14"}, "capabilities": ["oob"]}}
> > {'execute':'qmp_capabilities'}
> > {"return": {}}
> > {"execute":"qom-list-properties","arguments":{"typename":"max-x86_64-cpu"},"id":"libvirt-41"}
> > qemu-system-x86_64: ../target/i386/kvm/kvm-cpu.c:105: kvm_cpu_xsave_init: 
> > Assertion `esa->size == eax' failed.
> > Aborted (core dumped)
> > 
> > Note that the above is on a box with an 'AMD Ryzen 9 3900X'.
> > 
> > Curiously on a laptop with an Intel chip (Intel(R) Core(TM) i7-10610U)
> > it seems to work.
> > 
> > 
> 
> Not trying to beat a dead horse here, but I've just found another
> problem with this patch. On my laptop (Linux maggie
> 5.15.26-gentoo-x86_64 #1 SMP Thu Mar 10 08:55:28 CET 2022 x86_64
> Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz GenuineIntel GNU/Linux), when
> I start a guest it no longer sees AVX instructions:
> 
>   qemu.git $ ./build/qemu-system-x86_64 -accel kvm -cpu host ...
>

  Thanks Michal, this issue is caused by compatibility with older kernel 
version.

  The Qemu will report below logs:
  emu-system-x86_64: warning: cannot get sys attribute capabilities 0
  qemu-system-x86_64: warning: cannot get sys attribute capabilities 0
  qemu-system-x86_64: warning: cannot get sys attribute capabilities 0
  qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.0DH:EAX [bit 5]
  qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.0DH:EAX [bit 6]
  qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.0DH:EAX [bit 9]
  ..

  Since the AMX changes in Qemu need read ARCH_GET_XCOMP_SUPP attribute to get 
host supported_xcr0
  value, and new kernel release add this new API. So the older kernel can't 
report right xcr0 value.

  I made one new patch to fix this issue, please try this patch. thanks!
  https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg04732.html

  Paolo, this patch only fix this compatibility issue, but the issue caused by 
AMD cpu is still not
  fixed from my side because no AMD platform can be used in my side. If you 
have no time to check
  this issue, maybe I need lookfor this platform from our internal. thanks!

  Yang 

> Michal

[PATCH] x86/amx: compatible with older kernel release

2022-03-18 Thread Yang Zhong

The AMX KVM introduced one new ARCH_GET_XCOMP_SUPP system attribute
API to get host side supported_xcr0 and latest Qemu can decide if it
can request dynamically enabled XSAVE features permission. But this
implementation(19db68ca68) did not consider older kernel release.
This patch can avoid to read this new KVM_GET_DEVICE_ATTR ioctl.

Signed-off-by: Yang Zhong 
---
 target/i386/kvm/kvm.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ef2c68a6f4..cda95e7ba6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -420,14 +420,14 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 bool sys_attr = kvm_check_extension(s, KVM_CAP_SYS_ATTRIBUTES);
 if (!sys_attr) {
 warn_report("cannot get sys attribute capabilities %d", sys_attr);
-}
-
-int rc = kvm_ioctl(s, KVM_GET_DEVICE_ATTR, &attr);
-if (rc == -1 && (errno == ENXIO || errno == EINVAL)) {
-warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) "
-"error: %d", rc);
-}
-ret = (reg == R_EAX) ? bitmask : bitmask >> 32;
+} else {
+int rc = kvm_ioctl(s, KVM_GET_DEVICE_ATTR, &attr);
+if (rc == -1 && (errno == ENXIO || errno == EINVAL)) {
+warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) "
+"error: %d", rc);
+}
+ret = (reg == R_EAX) ? bitmask : bitmask >> 32;
+   }
 } else if (function == 0x8001 && reg == R_ECX) {
 /*
  * It's safe to enable TOPOEXT even if it's not returned by
-- 
2.25.1

Re: [PULL 15/22] x86: Grant AMX permission for guest

2022-03-16 Thread Yang Zhong

On Wed, Mar 16, 2022 at 04:57:39PM +0100, Peter Krempa wrote:
> On Tue, Mar 08, 2022 at 12:34:38 +0100, Paolo Bonzini wrote:
> > From: Yang Zhong 
> > 
> > Kernel allocates 4K xstate buffer by default. For XSAVE features
> > which require large state component (e.g. AMX), Linux kernel
> > dynamically expands the xstate buffer only after the process has
> > acquired the necessary permissions. Those are called dynamically-
> > enabled XSAVE features (or dynamic xfeatures).
> > 
> > There are separate permissions for native tasks and guests.
> > 
> > Qemu should request the guest permissions for dynamic xfeatures
> > which will be exposed to the guest. This only needs to be done
> > once before the first vcpu is created.
> > 
> > KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
> > get host side supported_xcr0 and Qemu can decide if it can request
> > dynamically enabled XSAVE features permission.
> > https://lore.kernel.org/all/20220126152210.3044876-1-pbonz...@redhat.com/
> > 
> > Suggested-by: Paolo Bonzini 
> > Signed-off-by: Yang Zhong 
> > Signed-off-by: Jing Liu 
> > Message-Id: <20220217060434.52460-4-yang.zh...@intel.com>
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  target/i386/cpu.c  |  7 +
> >  target/i386/cpu.h  |  4 +++
> >  target/i386/kvm/kvm-cpu.c  | 12 
> >  target/i386/kvm/kvm.c  | 57 ++
> >  target/i386/kvm/kvm_i386.h |  1 +
> >  5 files changed, 75 insertions(+), 6 deletions(-)
> 
> With this commit qemu crashes for me when invoking the following
> QMP command:
> 
> $ ~pipo/git/qemu.git/build/qemu-system-x86_64 -S -no-user-config -nodefaults 
> -nographic -machine none,accel=kvm -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 90, "minor": 2, "major": 6}, 
> "package": "v7.0.0-rc0-8-g1d60bb4b14"}, "capabilities": ["oob"]}}
> {'execute':'qmp_capabilities'}
> {"return": {}}
> {"execute":"qom-list-properties","arguments":{"typename":"max-x86_64-cpu"},"id":"libvirt-41"}
> qemu-system-x86_64: ../target/i386/kvm/kvm-cpu.c:105: kvm_cpu_xsave_init: 
> Assertion `esa->size == eax' failed.
> Aborted (core dumped)
> 
> Note that the above is on a box with an 'AMD Ryzen 9 3900X'.
> 
> Curiously on a laptop with an Intel chip (Intel(R) Core(TM) i7-10610U)
> it seems to work.

  Thanks for pointing this out!
  
  In my side, no AMD machine can be used to try this issue, I listed the
  FPU info from host kernel dmesg for reference.
  
  root@984fee00bf64:~# dmesg | grep fpu
  [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
  [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
  [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
  [0.00] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
  [0.00] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
  [0.00] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
  [0.00] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User 
registers'
  [0.00] x86/fpu: Supporting XSAVE feature 0x400: 'PASID state'
  [0.00] x86/fpu: Supporting XSAVE feature 0x2: 'AMX Tile config'
  [0.00] x86/fpu: Supporting XSAVE feature 0x4: 'AMX Tile data'
  [0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
  [0.00] x86/fpu: xstate_offset[5]:  832, xstate_sizes[5]:   64
  [0.00] x86/fpu: xstate_offset[6]:  896, xstate_sizes[6]:  512
  [0.00] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
  [0.00] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]:8
  [0.00] x86/fpu: xstate_offset[10]: 2440, xstate_sizes[10]:8
  [0.00] x86/fpu: xstate_offset[17]: 2496, xstate_sizes[17]:   64
  [0.00] x86/fpu: xstate_offset[18]: 2560, xstate_sizes[18]: 8192
  [0.00] x86/fpu: Enabled xstate features 0x606e7, context size is 
10752 bytes, using 'compacted' format.

  Paolo, if you have fix patch, I can double check this from Intel SPR server. 
thanks!

  Yang

[PATCH v3 7/8] x86: Support XFD and AMX xsave data migration

2022-02-28 Thread Yang Zhong

From: Zeng Guang 

XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
Reviewed-by: David Edmondson 
---
 target/i386/cpu.h |  9 +
 target/i386/kvm/kvm.c | 18 ++
 target/i386/machine.c | 42 ++
 3 files changed, 69 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 8c850e74b8..efea2c78ec 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -507,6 +507,9 @@ typedef enum X86Seg {
 
 #define MSR_VM_HSAVE_PA 0xc0010117
 
+#define MSR_IA32_XFD0x01c4
+#define MSR_IA32_XFD_ERR0x01c5
+
 #define MSR_IA32_BNDCFGS0x0d90
 #define MSR_IA32_XSS0x0da0
 #define MSR_IA32_UMWAIT_CONTROL 0xe1
@@ -872,6 +875,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_1_EAX_AVX_VNNI  (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16   (1U << 5)
+/* XFD Extend Feature Disabled */
+#define CPUID_D_1_EAX_XFD   (1U << 4)
 
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
@@ -1616,6 +1621,10 @@ typedef struct CPUX86State {
 uint64_t msr_rtit_cr3_match;
 uint64_t msr_rtit_addrs[MAX_RTIT_ADDRS];
 
+/* Per-VCPU XFD MSRs */
+uint64_t msr_xfd;
+uint64_t msr_xfd_err;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e64c06d358..fe8e924846 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3276,6 +3276,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_ia32_sgxlepubkeyhash[3]);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD,
+  env->msr_xfd);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR,
+  env->msr_xfd_err);
+}
+
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
 }
@@ -3668,6 +3675,11 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, MSR_IA32_SGXLEPUBKEYHASH3, 0);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -3964,6 +3976,12 @@ static int kvm_get_msrs(X86CPU *cpu)
 env->msr_ia32_sgxlepubkeyhash[index - MSR_IA32_SGXLEPUBKEYHASH0] =
msrs[i].data;
 break;
+case MSR_IA32_XFD:
+env->msr_xfd = msrs[i].data;
+break;
+case MSR_IA32_XFD_ERR:
+env->msr_xfd_err = msrs[i].data;
+break;
 }
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 6202f47793..1f9d0c46f1 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1483,6 +1483,46 @@ static const VMStateDescription vmstate_pdptrs = {
 }
 };
 
+static bool xfd_msrs_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD);
+}
+
+static const VMStateDescription vmstate_msr_xfd = {
+.name = "cpu/msr_xfd",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xfd_msrs_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.msr_xfd, X86CPU),
+VMSTATE_UINT64(env.msr_xfd_err, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool amx_xtile_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE);
+}
+
+static const VMStateDescription vmstate_amx_xtile = {
+.name = "cpu/intel_amx_xtile",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = amx_xtile_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8_ARRAY(env.xtilecfg, X86CPU, 64),
+VMSTATE_UINT8_ARRAY(env.xtiledata, X86CPU, 8192),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -1622,6 +1662,8 @@ const VMStateDescription vmstate_x86_cpu = {
 &vmstate_msr_tsx_ctrl,
 &vmstate_msr_intel_sgx,
 &vmstate_pdptrs,
+&vmstate_msr_xfd,
+&vmstate_amx_xtile,
 NULL
 }
 };

[PATCH v3 8/8] linux-header: Sync the linux headers

2022-02-28 Thread Yang Zhong

This patch will be dropped once Qemu sync linux 5.17 header.
Making all linux-headers changes here are only for maintainers
to easily remove those changes once those patches are queued.

Signed-off-by: Yang Zhong 
---
 linux-headers/asm-x86/kvm.h | 3 +++
 linux-headers/linux/kvm.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 2da3316bb5..8224d0dda2 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -452,6 +452,9 @@ struct kvm_sync_regs {
 
 #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE0x0001
 
+/* attributes for system fd (group 0) */
+#define KVM_X86_XCOMP_GUEST_SUPP   0
+
 struct kvm_vmx_nested_state_data {
__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 00af3bc333..002503dc8b 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
+#define KVM_CAP_SYS_ATTRIBUTES 209
 
 #ifdef KVM_CAP_IRQ_ROUTING

[PATCH v3 5/8] x86: Add AMX CPUIDs enumeration

2022-02-28 Thread Yang Zhong

From: Jing Liu 

Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
Reviewed-by: David Edmondson 
---
 target/i386/cpu.c | 55 ---
 target/i386/kvm/kvm.c |  4 +++-
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 79e24bb23f..351a1e4f2a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -575,6 +575,18 @@ static CPUCacheInfo legacy_l3_cache = {
 #define INTEL_PT_CYCLE_BITMAP0x1fff /* Support 0,2^(0~11) */
 #define INTEL_PT_PSB_BITMAP  (0x003f << 16) /* Support 
2K,4K,8K,16K,32K,64K */
 
+/* CPUID Leaf 0x1D constants: */
+#define INTEL_AMX_TILE_MAX_SUBLEAF 0x1
+#define INTEL_AMX_TOTAL_TILE_BYTES 0x2000
+#define INTEL_AMX_BYTES_PER_TILE   0x400
+#define INTEL_AMX_BYTES_PER_ROW0x40
+#define INTEL_AMX_TILE_MAX_NAMES   0x8
+#define INTEL_AMX_TILE_MAX_ROWS0x10
+
+/* CPUID Leaf 0x1E constants: */
+#define INTEL_AMX_TMUL_MAX_K   0x10
+#define INTEL_AMX_TMUL_MAX_N   0x40
+
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
   uint32_t vendor2, uint32_t vendor3)
 {
@@ -844,8 +856,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "avx512-vp2intersect", NULL, "md-clear", NULL,
 NULL, NULL, "serialize", NULL,
 "tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
-NULL, NULL, NULL, "avx512-fp16",
-NULL, NULL, "spec-ctrl", "stibp",
+NULL, NULL, "amx-bf16", "avx512-fp16",
+"amx-tile", "amx-int8", "spec-ctrl", "stibp",
 NULL, "arch-capabilities", "core-capability", "ssbd",
 },
 .cpuid = {
@@ -910,7 +922,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 "xsaveopt", "xsavec", "xgetbv1", "xsaves",
-NULL, NULL, NULL, NULL,
+"xfd", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -5586,6 +5598,43 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 break;
 }
+case 0x1D: {
+/* AMX TILE */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*eax = INTEL_AMX_TILE_MAX_SUBLEAF;
+} else if (count == 1) {
+*eax = INTEL_AMX_TOTAL_TILE_BYTES |
+   (INTEL_AMX_BYTES_PER_TILE << 16);
+*ebx = INTEL_AMX_BYTES_PER_ROW | (INTEL_AMX_TILE_MAX_NAMES << 16);
+*ecx = INTEL_AMX_TILE_MAX_ROWS;
+}
+break;
+}
+case 0x1E: {
+/* AMX TMUL */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*ebx = INTEL_AMX_TMUL_MAX_K | (INTEL_AMX_TMUL_MAX_N << 8);
+}
+break;
+}
 case 0x4000:
 /*
  * CPUID code in kvm_arch_init_vcpu() ignores stuff
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 742f0eac4a..b786d6da96 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1780,7 +1780,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c = &cpuid_data.entries[cpuid_i++];
 }
 break;
-case 0x14: {
+case 0x14:
+case 0x1d:
+case 0x1e: {
 uint32_t times;
 
 c->function = i;

[PATCH v3 6/8] x86: Add support for KVM_CAP_XSAVE2 and AMX state migration

2022-02-28 Thread Yang Zhong

From: Jing Liu 

When dynamic xfeatures (e.g. AMX) are used by the guest, the xsave
area would be larger than 4KB. KVM_GET_XSAVE2 and KVM_SET_XSAVE
under KVM_CAP_XSAVE2 works with a xsave buffer larger than 4KB.
Always use the new ioctls under KVM_CAP_XSAVE2 when KVM supports it.

Signed-off-by: Jing Liu 
Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h  |  4 
 target/i386/kvm/kvm.c  | 42 --
 target/i386/xsave_helper.c | 33 ++
 3 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 925d0129e2..8c850e74b8 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1527,6 +1527,10 @@ typedef struct CPUX86State {
 uint64_t opmask_regs[NB_OPMASK_REGS];
 YMMReg zmmh_regs[CPU_NB_REGS];
 ZMMReg hi16_zmm_regs[CPU_NB_REGS];
+#ifdef TARGET_X86_64
+uint8_t xtilecfg[64];
+uint8_t xtiledata[8192];
+#endif
 
 /* sysenter registers */
 uint32_t sysenter_cs;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b786d6da96..e64c06d358 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -123,6 +123,7 @@ static uint32_t num_architectural_pmu_gp_counters;
 static uint32_t num_architectural_pmu_fixed_counters;
 
 static int has_xsave;
+static int has_xsave2;
 static int has_xcrs;
 static int has_pit_state2;
 static int has_sregs2;
@@ -1586,6 +1587,26 @@ static Error *invtsc_mig_blocker;
 
 #define KVM_MAX_CPUID_ENTRIES  100
 
+static void kvm_init_xsave(CPUX86State *env)
+{
+if (has_xsave2) {
+env->xsave_buf_len = QEMU_ALIGN_UP(has_xsave2, 4096);
+} else if (has_xsave) {
+env->xsave_buf_len = sizeof(struct kvm_xsave);
+} else {
+return;
+}
+
+env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
+memset(env->xsave_buf, 0, env->xsave_buf_len);
+/*
+ * The allocated storage must be large enough for all of the
+ * possible XSAVE state components.
+ */
+assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) <=
+   env->xsave_buf_len);
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct {
@@ -1615,6 +1636,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 cpuid_i = 0;
 
+has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+
 r = kvm_arch_set_tsc_khz(cs);
 if (r < 0) {
 return r;
@@ -2004,19 +2027,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (r) {
 goto fail;
 }
-
-if (has_xsave) {
-env->xsave_buf_len = sizeof(struct kvm_xsave);
-env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
-memset(env->xsave_buf, 0, env->xsave_buf_len);
-
-/*
- * The allocated storage must be large enough for all of the
- * possible XSAVE state components.
- */
-assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX)
-   <= env->xsave_buf_len);
-}
+kvm_init_xsave(env);
 
 max_nested_state_len = kvm_max_nested_state_length();
 if (max_nested_state_len > 0) {
@@ -3320,13 +3331,14 @@ static int kvm_get_xsave(X86CPU *cpu)
 {
 CPUX86State *env = &cpu->env;
 void *xsave = env->xsave_buf;
-int ret;
+int type, ret;
 
 if (!has_xsave) {
 return kvm_get_fpu(cpu);
 }
 
-ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+type = has_xsave2 ? KVM_GET_XSAVE2 : KVM_GET_XSAVE;
+ret = kvm_vcpu_ioctl(CPU(cpu), type, xsave);
 if (ret < 0) {
 return ret;
 }
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index ac61a96344..b6a004505f 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -5,6 +5,7 @@
 #include "qemu/osdep.h"
 
 #include "cpu.h"
+#include 
 
 void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
 {
@@ -126,6 +127,22 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, 
uint32_t buflen)
 
 memcpy(pkru, &env->pkru, sizeof(env->pkru));
 }
+
+e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
+if (e->size && e->offset) {
+XSaveXTILECFG *tilecfg = buf + e->offset;
+
+memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
+}
+
+if (buflen > sizeof(struct kvm_xsave)) {
+e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
+if (e->size && e->offset && buflen >= e->size + e->offset) {
+XSaveXTILEDATA *tiledata = buf + e->offset;
+
+memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
+}
+}
 #endif
 }
 
@@ -247,5 +264,21 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void 
*buf, uint32_t buflen)
 pkru = buf + e->offset;
 memcpy(&env->pkru, pkru, sizeof(env->pkru));

[PATCH v3 2/8] x86: Add AMX XTILECFG and XTILEDATA components

2022-02-28 Thread Yang Zhong

From: Jing Liu 

The AMX TILECFG register and the TMMx tile data registers are
saved/restored via XSAVE, respectively in state component 17
(64 bytes) and state component 18 (8192 bytes).

Add AMX feature bits to x86_ext_save_areas array to set
up AMX components. Add structs that define the layout of
AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
structs sizes.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
Reviewed-by: David Edmondson 
---
 target/i386/cpu.h | 18 +-
 target/i386/cpu.c |  8 
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7bd9d58505..3ff1b49d29 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -539,6 +539,8 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT6
 #define XSTATE_Hi16_ZMM_BIT 7
 #define XSTATE_PKRU_BIT 9
+#define XSTATE_XTILE_CFG_BIT17
+#define XSTATE_XTILE_DATA_BIT   18
 
 #define XSTATE_FP_MASK  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK (1ULL << XSTATE_SSE_BIT)
@@ -847,6 +849,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_TSX_LDTRK (1U << 16)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16   (1U << 23)
+/* AMX tile (two-dimensional register) */
+#define CPUID_7_0_EDX_AMX_TILE  (1U << 24)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26)
 /* Single Thread Indirect Branch Predictors */
@@ -1350,6 +1354,16 @@ typedef struct XSavePKRU {
 uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 17: AMX XTILECFG state */
+typedef struct XSaveXTILECFG {
+uint8_t xtilecfg[64];
+} XSaveXTILECFG;
+
+/* Ext. save area 18: AMX XTILEDATA state */
+typedef struct XSaveXTILEDATA {
+uint8_t xtiledata[8][1024];
+} XSaveXTILEDATA;
+
 QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
@@ -1357,6 +1371,8 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveOpmask) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveZMM_Hi256) != 0x200);
 QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
 QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILECFG) != 0x40);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILEDATA) != 0x2000);
 
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
@@ -1364,7 +1380,7 @@ typedef struct ExtSaveArea {
 uint32_t ecx;
 } ExtSaveArea;
 
-#define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
+#define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
 
 extern ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT];
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0f3c477dfc..ec35dd1717 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1402,6 +1402,14 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = 
{
 [XSTATE_PKRU_BIT] =
   { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
 .size = sizeof(XSavePKRU) },
+[XSTATE_XTILE_CFG_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILECFG),
+},
+[XSTATE_XTILE_DATA_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILEDATA)
+},
 };
 
 static uint32_t xsave_area_size(uint64_t mask)

[PATCH v3 1/8] x86: Fix the 64-byte boundary enumeration for extended state

2022-02-28 Thread Yang Zhong

From: Jing Liu 

The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
indicate whether the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

Right now, they are all zero because no supported component
needed the bit to be set, but the upcoming AMX feature will
use it.  Fix the subleaves value according to KVM's supported
cpuid.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
Reviewed-by: David Edmondson 
---
 target/i386/cpu.h | 6 ++
 target/i386/cpu.c | 1 +
 target/i386/kvm/kvm-cpu.c | 1 +
 3 files changed, 8 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e69ab5dd78..7bd9d58505 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -550,6 +550,11 @@ typedef enum X86Seg {
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
 
+#define ESA_FEATURE_ALIGN64_BIT 1
+
+#define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
+
+
 /* CPUID feature words */
 typedef enum FeatureWord {
 FEAT_1_EDX, /* CPUID[1].EDX */
@@ -1356,6 +1361,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
 uint32_t offset, size;
+uint32_t ecx;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6c7ef1099b..0f3c477dfc 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5488,6 +5488,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
+*ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
 }
 }
 break;
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index d95028018e..ce27d3b1df 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -104,6 +104,7 @@ static void kvm_cpu_xsave_init(void)
 if (sz != 0) {
 assert(esa->size == sz);
 esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
+esa->ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
 }
 }
 }

[PATCH v3 0/8] AMX support in Qemu

2022-02-28 Thread Yang Zhong

Intel introduces Advanced Matrix Extensions (AMX) [1] feature that
consists of configurable two-dimensional "TILE" registers and new
accelerator instructions that operate on them. TMUL (Tile matrix
MULtiply) is the first accelerator instruction set to use the new
registers.

Since AMX KVM patches have been merged into Linux release, this series
is based on latest Linux release(5.17-rc4).

According to the KVM design, the userspace VMM (e.g. Qemu) is expected
to request guest permission for the dynamically-enabled XSAVE features
only once when the first vCPU is created, and KVM checks guest permission
in KVM_SET_CPUID2.

Intel AMX is XSAVE supported and XSAVE enabled. Those extended features
has large state while current kvm_xsave only allows 4KB. The AMX KVM has
extended struct kvm_xsave to meet this requirenment and added one extra
KVM_GET_XSAVE2 ioctl to handle extended features. From our test, the AMX
live migration work well.

Notice: This version still includes some definitions in the linux-headers,
once Qemu sync those linux-headers, please directly remove patch 8. So
ignore patch 8 changes.

[1] Intel Architecture Instruction Set Extension Programming Reference
https://software.intel.com/content/dam/develop/external/us/en/documents/\
architecture-instruction-set-extensions-programming-reference.pdf

Thanks,
Yang


Change history
--
v2->v3:
   - Patch 4, misc cleanup(David).
   - Patch 3, updated the kvm_request_xsave_components()(Paolo).
   - Patch 8, rebased the Linux-header on latest Qemu.

v1->v2:
   - Patch 1 moved "esa->ecx" into the "if{}"(Paolo).
   - Patch 3, the requiremnets from Paoalo,
 - Moved "esa->ecx" into the "if{}".
 - Used the "mask" as parameter to replace xtiledata bits in
   kvm_request_xsave_components()
 - Used the new defined KVM_X86_XCOMP_GUEST_SUPP from KVM to get
   supported_xcr0 from kvm_arch_get_supported_cpuid().
 - Updated the kvm_request_xsave_components() for future usage.
   - Patch 5 added "case 0x1e:" in kvm_arch_init_vcpu()(Paolo).
   - Patch 6 replaced "if (e->size && e->offset)" with
 "if (e->size && e->offset && buflen >= e->size + e->offset)"
 for xsave and xrstor(Paolo).
   - Patch 8, which is new added patch and is only for linux-headers.
 This patch can be directly dropped once Qemu sync linux-headers.

rfc v1->v1:
   - Patch 1 changed commit message(Kevin and Paolo).
   - Patch 2 changed commit message(Kevin and Paolo).
   - Patch 3, below requirements from Paolo,
 - Called ARCH_REQ_XCOMP_GUEST_PERM from x86_cpu_enable_xsave_components.
   Used kvm_request_xsave_components() to replace x86_xsave_req_perm().
   Replaced syscall(ARCH_GET_XCOMP_GUEST_PERM) with 
kvm_arch_get_supported_cpuid()
   in kvm_request_xsave_components().
 - Changed kvm_cpu_xsave_init() to use host_cpuid() instead of
   kvm_arch_get_supported_cpuid().
 - Added the "function == 0xd" handle in kvm_arch_get_supported_cpuid().
   - Patch 4, used "uint32_t ecx" to replace "uint32_t need_align, support_xfd".
   - Patch 6, below changes,
 - Changed the commit message(Kevin) and Used the new function
 - kvm_init_xsave() to replace some pieces of code(Wei).
 - Moved KVM_CAP_XSAVE2 extension check to kvm_arch_init_vcpu() to
   make the request permission before KVM_CAP_XSAVE2 extension check(Paolo).
   - Removed RFC prefix.


Jing Liu (5):
  x86: Fix the 64-byte boundary enumeration for extended state
  x86: Add AMX XTILECFG and XTILEDATA components
  x86: Add XFD faulting bit for state components
  x86: Add AMX CPUIDs enumeration
  x86: Add support for KVM_CAP_XSAVE2 and AMX state migration

Yang Zhong (2):
  x86: Grant AMX permission for guest
  linux-header: Sync the linux headers

Zeng Guang (1):
  x86: Support XFD and AMX xsave data migration

 linux-headers/asm-x86/kvm.h |   3 +
 linux-headers/linux/kvm.h   |   1 +
 target/i386/cpu.h   |  43 -
 target/i386/kvm/kvm_i386.h  |   1 +
 target/i386/cpu.c   |  72 -
 target/i386/kvm/kvm-cpu.c   |  11 ++--
 target/i386/kvm/kvm.c   | 121 +++-
 target/i386/machine.c   |  42 +
 target/i386/xsave_helper.c  |  33 ++
 9 files changed, 302 insertions(+), 25 deletions(-)

[PATCH v3 3/8] x86: Grant AMX permission for guest

2022-02-28 Thread Yang Zhong

Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).

There are separate permissions for native tasks and guests.

Qemu should request the guest permissions for dynamic xfeatures
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created.

KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
get host side supported_xcr0 and Qemu can decide if it can request
dynamically enabled XSAVE features permission.
https://lore.kernel.org/all/20220126152210.3044876-1-pbonz...@redhat.com/

Suggested-by: Paolo Bonzini 
Signed-off-by: Yang Zhong 
Signed-off-by: Jing Liu 
---
 target/i386/cpu.h  |  4 +++
 target/i386/kvm/kvm_i386.h |  1 +
 target/i386/cpu.c  |  7 +
 target/i386/kvm/kvm-cpu.c  | 12 
 target/i386/kvm/kvm.c  | 57 ++
 5 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 3ff1b49d29..9630f4712a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -551,6 +551,10 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
+#define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
+
+#define XSTATE_DYNAMIC_MASK (XSTATE_XTILE_DATA_MASK)
 
 #define ESA_FEATURE_ALIGN64_BIT 1
 
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index a978509d50..4124912c20 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -52,5 +52,6 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp);
 uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address);
 
 bool kvm_enable_sgx_provisioning(KVMState *s);
+void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask);
 
 #endif
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ec35dd1717..505ee289bc 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6007,6 +6007,7 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 CPUX86State *env = &cpu->env;
 int i;
 uint64_t mask;
+static bool request_perm;
 
 if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
 env->features[FEAT_XSAVE_COMP_LO] = 0;
@@ -6022,6 +6023,12 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 }
 }
 
+/* Only request permission for first vcpu */
+if (kvm_enabled() && !request_perm) {
+kvm_request_xsave_components(cpu, mask);
+request_perm = true;
+}
+
 env->features[FEAT_XSAVE_COMP_LO] = mask;
 env->features[FEAT_XSAVE_COMP_HI] = mask >> 32;
 }
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index ce27d3b1df..a35a1bf9fe 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -84,7 +84,7 @@ static void kvm_cpu_max_instance_init(X86CPU *cpu)
 static void kvm_cpu_xsave_init(void)
 {
 static bool first = true;
-KVMState *s = kvm_state;
+uint32_t eax, ebx, ecx, edx;
 int i;
 
 if (!first) {
@@ -100,11 +100,11 @@ static void kvm_cpu_xsave_init(void)
 ExtSaveArea *esa = &x86_ext_save_areas[i];
 
 if (esa->size) {
-int sz = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX);
-if (sz != 0) {
-assert(esa->size == sz);
-esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
-esa->ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
+host_cpuid(0xd, i, &eax, &ebx, &ecx, &edx);
+if (eax != 0) {
+assert(esa->size == eax);
+esa->offset = ebx;
+esa->ecx = ecx;
 }
 }
 }
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 2c8feb4a6f..742f0eac4a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -17,6 +17,7 @@
 #include "qapi/error.h"
 #include 
 #include 
+#include 
 
 #include 
 #include "standard-headers/asm-x86/kvm_para.h"
@@ -348,6 +349,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 struct kvm_cpuid2 *cpuid;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
+uint64_t bitmask;
 
 cpuid = get_supported_cpuid(s);
 
@@ -405,6 +407,25 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 if (!has_msr_arch_capabs) {
 ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
 }
+} else if (function == 0xd

[PATCH v3 4/8] x86: Add XFD faulting bit for state components

2022-02-28 Thread Yang Zhong

From: Jing Liu 

Intel introduces XFD faulting mechanism for extended
XSAVE features to dynamically enable the features in
runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
as 1, it indicates support for XFD faulting of this
state component.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
Reviewed-by: David Edmondson 
---
 target/i386/cpu.h | 2 ++
 target/i386/cpu.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9630f4712a..925d0129e2 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -557,8 +557,10 @@ typedef enum X86Seg {
 #define XSTATE_DYNAMIC_MASK (XSTATE_XTILE_DATA_MASK)
 
 #define ESA_FEATURE_ALIGN64_BIT 1
+#define ESA_FEATURE_XFD_BIT 2
 
 #define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
+#define ESA_FEATURE_XFD_MASK(1U << ESA_FEATURE_XFD_BIT)
 
 
 /* CPUID feature words */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 505ee289bc..79e24bb23f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5496,7 +5496,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
-*ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
+*ecx = esa->ecx &
+   (ESA_FEATURE_ALIGN64_MASK | ESA_FEATURE_XFD_MASK);
 }
 }
 break;

Re: [PATCH v2 3/8] x86: Grant AMX permission for guest

2022-02-25 Thread Yang Zhong

On Thu, Feb 17, 2022 at 02:44:10PM +0100, Paolo Bonzini wrote:
> On 2/17/22 06:58, Yang Zhong wrote:
> >>+
> >>+if ((mask & XSTATE_XTILE_DATA_MASK) == XSTATE_XTILE_DATA_MASK) {
> >>+bitmask = kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EAX);
> >>+if (!(bitmask & XSTATE_XTILE_DATA_MASK)) {
> >Paolo, last time you suggested below changes for here:
> >
> >rc = kvm_arch_get_supported_cpuid(s, 0xd, 0,
> >   (xdata_bit < 32 ? R_EAX : R_EDX));
> >if (!(rc & BIT(xdata_bit & 31)) {
> >   ...
> >}
> >
> >   Since I used "mask" as parameter here, so I had to directly use R_EAX 
> > here.
> >   Please review and if need change it to like "(xdata_bit < 32 ? R_EAX : 
> > R_EDX)",
> >   I will change this in next version, thanks!
> 
> I looked at this function more closely because it didn't compile on non-Linux
> systems, too.  I think it's better to write it already to plan for more
> dynamic features.  In the code below, I'm also relying on
> KVM_GET_SUPPORTED_CPUID/KVM_X86_COMP_GUEST_SUPP being executed
> before ARCH_REQ_XCOMP_GUEST_PERM, which therefore cannot fail.
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 377d993438..1d0c006077 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -43,8 +43,6 @@
>  #include "disas/capstone.h"
>  #include "cpu-internal.h"
> -#include 
> -
>  /* Helpers for building CPUID[2] descriptors: */
>  struct CPUID2CacheDescriptorInfo {
> @@ -6002,40 +6000,6 @@ static void x86_cpu_adjust_feat_level(X86CPU *cpu, 
> FeatureWord w)
>  }
>  }
> -static void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
> -{
> -KVMState *s = kvm_state;
> -uint64_t bitmask;
> -long rc;
> -
> -if ((mask & XSTATE_XTILE_DATA_MASK) == XSTATE_XTILE_DATA_MASK) {
> -bitmask = kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EAX);
> -if (!(bitmask & XSTATE_XTILE_DATA_MASK)) {
> -warn_report("no amx support from supported_xcr0, "
> -"bitmask:0x%lx", bitmask);
> -return;
> -}
> -
> -rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
> -  XSTATE_XTILE_DATA_BIT);
> -if (rc) {
> -/*
> - * The older kernel version(<5.15) can't support
> - * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
> - */
> -return;
> -}
> -
> -rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
> -if (rc) {
> -warn_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
> -} else if (!(bitmask & XFEATURE_XTILE_MASK)) {
> -warn_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
> -"and bitmask=0x%lx", bitmask);
> -}
> -}
> -}
> -
>  /* Calculate XSAVE components based on the configured CPU feature flags */
>  static void x86_cpu_enable_xsave_components(X86CPU *cpu)
>  {
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index d4ad0f56bd..de949bd63d 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -551,11 +551,8 @@ typedef enum X86Seg {
>  #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
>  #define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
>  #define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
> -#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
> - | XSTATE_XTILE_DATA_MASK)
> -#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
> -#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
> +#define XSTATE_DYNAMIC_MASK (XSTATE_XTILE_DATA_MASK)
>  #define ESA_FEATURE_ALIGN64_BIT 1
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3bdcd724c4..4b07778970 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -17,6 +17,7 @@
>  #include "qapi/error.h"
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "standard-headers/asm-x86/kvm_para.h"
> @@ -5168,3 +5169,39 @@ bool kvm_arch_cpu_check_are_resettable(void)
>  {
>  return !sev_es_enabled();
>  }
> +
> +#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
> +
> +void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
> +{
> +KVMState *s = kvm_state;
> +uint64_t supported;
> +
> +mask &= XSTATE_DYNAMIC_MASK;
> +if (!m

Re: [PATCH v2 6/8] x86: add support for KVM_CAP_XSAVE2 and AMX state migration

2022-02-24 Thread Yang Zhong

On Mon, Feb 21, 2022 at 01:25:53PM +, David Edmondson wrote:
> On Wednesday, 2022-02-16 at 22:04:32 -08, Yang Zhong wrote:
> 
> > From: Jing Liu 
> >
> > When dynamic xfeatures (e.g. AMX) are used by the guest, the xsave
> > area would be larger than 4KB. KVM_GET_XSAVE2 and KVM_SET_XSAVE
> > under KVM_CAP_XSAVE2 works with a xsave buffer larger than 4KB.
> > Always use the new ioctls under KVM_CAP_XSAVE2 when KVM supports it.
> >
> > Signed-off-by: Jing Liu 
> > Signed-off-by: Zeng Guang 
> > Signed-off-by: Wei Wang 
> > Signed-off-by: Yang Zhong 
> > ---
> >  target/i386/cpu.h  |  4 
> >  target/i386/kvm/kvm.c  | 42 --
> >  target/i386/xsave_helper.c | 33 ++
> >  3 files changed, 64 insertions(+), 15 deletions(-)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index f7fc2e97a6..de9da38e42 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1528,6 +1528,10 @@ typedef struct CPUX86State {
> >  uint64_t opmask_regs[NB_OPMASK_REGS];
> >  YMMReg zmmh_regs[CPU_NB_REGS];
> >  ZMMReg hi16_zmm_regs[CPU_NB_REGS];
> > +#ifdef TARGET_X86_64
> > +uint8_t xtilecfg[64];
> > +uint8_t xtiledata[8192];
> > +#endif
> 
> Can we have defined constants for these sizes? They also appear in patch
> 2.

  David, the constants we used here are mainly consistent with other members
  in this struct and file.  thanks!

  Yang

Re: [PATCH v2 4/8] x86: Add XFD faulting bit for state components

2022-02-24 Thread Yang Zhong

On Mon, Feb 21, 2022 at 01:00:41PM +, David Edmondson wrote:
> On Wednesday, 2022-02-16 at 22:04:30 -08, Yang Zhong wrote:
> 
> > From: Jing Liu 
> >
> > Intel introduces XFD faulting mechanism for extended
> > XSAVE features to dynamically enable the features in
> > runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
> > as 1, it indicates support for XFD faulting of this
> > state component.
> >
> > Signed-off-by: Jing Liu 
> > Signed-off-by: Yang Zhong 
> 
> Small comment below...
> 
> Reviewed-by: David Edmondson 
> 
> > ---
> >  target/i386/cpu.h | 2 ++
> >  target/i386/cpu.c | 3 ++-
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index d4ad0f56bd..f7fc2e97a6 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -558,8 +558,10 @@ typedef enum X86Seg {
> >  #define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
> >
> >  #define ESA_FEATURE_ALIGN64_BIT 1
> > +#define ESA_FEATURE_XFD_BIT 2
> >
> >  #define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
> > +#define ESA_FEATURE_XFD_MASK(1U << ESA_FEATURE_XFD_BIT)
> >
> >  /* CPUID feature words */
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 377d993438..5a7ee8c7e1 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -5497,7 +5497,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> > uint32_t count,
> >  const ExtSaveArea *esa = &x86_ext_save_areas[count];
> >  *eax = esa->size;
> >  *ebx = esa->offset;
> > -*ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
> > +*ecx = (esa->ecx & ESA_FEATURE_ALIGN64_MASK) |
> > +   (esa->ecx & ESA_FEATURE_XFD_MASK);
> 
> Is:
> 
> *ecx = esa->ecx &
>(ESA_FEATURE_ALIGN64_MASK | ESA_FEATURE_XFD_MASK);
> 
> not more usual?


  Thanks David, I will update this in next version.

  Yang

> 
> >  }
> >  }
> >  break;
> 
> dme.
> -- 
> All of us, we're going out tonight. We're gonna walk all over your cars.

Re: [PATCH v2 3/8] x86: Grant AMX permission for guest

2022-02-16 Thread Yang Zhong

On Wed, Feb 16, 2022 at 10:04:29PM -0800, Yang Zhong wrote:
> Kernel allocates 4K xstate buffer by default. For XSAVE features
> which require large state component (e.g. AMX), Linux kernel
> dynamically expands the xstate buffer only after the process has
> acquired the necessary permissions. Those are called dynamically-
> enabled XSAVE features (or dynamic xfeatures).
> 
> There are separate permissions for native tasks and guests.
> 
> Qemu should request the guest permissions for dynamic xfeatures
> which will be exposed to the guest. This only needs to be done
> once before the first vcpu is created.
> 
> KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
> get host side supported_xcr0 and Qemu can decide if it can request
> dynamically enabled XSAVE features permission.
> https://lore.kernel.org/all/20220126152210.3044876-1-pbonz...@redhat.com/
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Yang Zhong 
> Signed-off-by: Jing Liu 
> ---
>  target/i386/cpu.h |  7 +++
>  target/i386/cpu.c | 43 +++
>  target/i386/kvm/kvm-cpu.c | 12 +--
>  target/i386/kvm/kvm.c | 20 ++
>  4 files changed, 76 insertions(+), 6 deletions(-)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 06d2d6bccf..d4ad0f56bd 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -549,6 +549,13 @@ typedef enum X86Seg {
>  #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
>  #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
>  #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
> +#define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
> +#define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
> +#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
> + | XSTATE_XTILE_DATA_MASK)
> +
> +#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
> +#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
>  
>  #define ESA_FEATURE_ALIGN64_BIT 1
>  
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index ea7e8f9081..377d993438 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -43,6 +43,8 @@
>  #include "disas/capstone.h"
>  #include "cpu-internal.h"
>  
> +#include 
> +
>  /* Helpers for building CPUID[2] descriptors: */
>  
>  struct CPUID2CacheDescriptorInfo {
> @@ -6000,12 +6002,47 @@ static void x86_cpu_adjust_feat_level(X86CPU *cpu, 
> FeatureWord w)
>  }
>  }
>  
> +static void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
> +{
> +KVMState *s = kvm_state;
> +uint64_t bitmask;
> +long rc;
> +
> +if ((mask & XSTATE_XTILE_DATA_MASK) == XSTATE_XTILE_DATA_MASK) {
> +bitmask = kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EAX);
> +if (!(bitmask & XSTATE_XTILE_DATA_MASK)) {

   Paolo, last time you suggested below changes for here:

   rc = kvm_arch_get_supported_cpuid(s, 0xd, 0,
  (xdata_bit < 32 ? R_EAX : R_EDX));
   if (!(rc & BIT(xdata_bit & 31)) {
  ...
   }   

  Since I used "mask" as parameter here, so I had to directly use R_EAX here.
  Please review and if need change it to like "(xdata_bit < 32 ? R_EAX : 
R_EDX)",
  I will change this in next version, thanks!

  Yang


> +warn_report("no amx support from supported_xcr0, "
> +"bitmask:0x%lx", bitmask);
> +return;
> +}
> +
> +rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
> +  XSTATE_XTILE_DATA_BIT);
> +if (rc) {
> +/*
> + * The older kernel version(<5.15) can't support
> + * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
> + */
> +return;
> +}
> +
> +rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
> +if (rc) {
> +warn_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
> +} else if (!(bitmask & XFEATURE_XTILE_MASK)) {
> +warn_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
> +"and bitmask=0x%lx", bitmask);
> +}
> +}
> +}
> +
>  /* Calculate XSAVE components based on the configured CPU feature flags */
>  static void x86_cpu_enable_xsave_components(X86CPU *cpu)
>  {
>  CPUX86State *env = &cpu->env;
>  int i;
>  uint64_t mask;
> +static bool request_perm;
>  
>

[PATCH v2 8/8] linux-header: Sync the linux headers

2022-02-16 Thread Yang Zhong

This patch will be dropped once Qemu sync linux 5.17 header.
Making all linux-headers changes here are only for maintainers
to easily remove those changes once those patches are queued.

Signed-off-by: Yang Zhong 
---
 linux-headers/asm-x86/kvm.h | 17 +
 linux-headers/linux/kvm.h   |  4 
 2 files changed, 21 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 5a776a08f7..17735430db 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -375,7 +375,21 @@ struct kvm_debugregs {
 
 /* for KVM_CAP_XSAVE */
 struct kvm_xsave {
+   /*
+* KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many bytes
+* as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+* respectively, when invoked on the vm file descriptor.
+*
+* The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+* will always be at least 4096. Currently, it is only greater
+* than 4096 if a dynamic feature has been enabled with
+* ``arch_prctl()``, but this may change in the future.
+*
+* The offsets of the state save areas in struct kvm_xsave follow
+* the contents of CPUID leaf 0xD on the host.
+*/
__u32 region[1024];
+   __u32 extra[0];
 };
 
 #define KVM_MAX_XCRS   16
@@ -438,6 +452,9 @@ struct kvm_sync_regs {
 
 #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE0x0001
 
+/* attributes for system fd (group 0) */
+#define KVM_X86_XCOMP_GUEST_SUPP   0
+
 struct kvm_vmx_nested_state_data {
__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 02c5e7b7bb..54ce7e6d90 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1130,6 +1130,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_XSAVE2  208
+#define KVM_CAP_SYS_ATTRIBUTES 209
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1677,6 +1679,8 @@ struct kvm_xen_hvm_attr {
 #define KVM_GET_SREGS2 _IOR(KVMIO,  0xcc, struct kvm_sregs2)
 #define KVM_SET_SREGS2 _IOW(KVMIO,  0xcd, struct kvm_sregs2)
 
+#define KVM_GET_XSAVE2   _IOR(KVMIO,  0xcf, struct kvm_xsave)
+
 struct kvm_xen_vcpu_attr {
__u16 type;
__u16 pad[3];

[PATCH v2 7/8] x86: Support XFD and AMX xsave data migration

2022-02-16 Thread Yang Zhong

From: Zeng Guang 

XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h |  9 +
 target/i386/kvm/kvm.c | 18 ++
 target/i386/machine.c | 42 ++
 3 files changed, 69 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index de9da38e42..509c16323a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -505,6 +505,9 @@ typedef enum X86Seg {
 
 #define MSR_VM_HSAVE_PA 0xc0010117
 
+#define MSR_IA32_XFD0x01c4
+#define MSR_IA32_XFD_ERR0x01c5
+
 #define MSR_IA32_BNDCFGS0x0d90
 #define MSR_IA32_XSS0x0da0
 #define MSR_IA32_UMWAIT_CONTROL 0xe1
@@ -873,6 +876,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_1_EAX_AVX_VNNI  (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16   (1U << 5)
+/* XFD Extend Feature Disabled */
+#define CPUID_D_1_EAX_XFD   (1U << 4)
 
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
@@ -1617,6 +1622,10 @@ typedef struct CPUX86State {
 uint64_t msr_rtit_cr3_match;
 uint64_t msr_rtit_addrs[MAX_RTIT_ADDRS];
 
+/* Per-VCPU XFD MSRs */
+uint64_t msr_xfd;
+uint64_t msr_xfd_err;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ff064e3d8f..3dd24b6b0e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3275,6 +3275,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_ia32_sgxlepubkeyhash[3]);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD,
+  env->msr_xfd);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR,
+  env->msr_xfd_err);
+}
+
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
 }
@@ -3667,6 +3674,11 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, MSR_IA32_SGXLEPUBKEYHASH3, 0);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -3963,6 +3975,12 @@ static int kvm_get_msrs(X86CPU *cpu)
 env->msr_ia32_sgxlepubkeyhash[index - MSR_IA32_SGXLEPUBKEYHASH0] =
msrs[i].data;
 break;
+case MSR_IA32_XFD:
+env->msr_xfd = msrs[i].data;
+break;
+case MSR_IA32_XFD_ERR:
+env->msr_xfd_err = msrs[i].data;
+break;
 }
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 6202f47793..1f9d0c46f1 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1483,6 +1483,46 @@ static const VMStateDescription vmstate_pdptrs = {
 }
 };
 
+static bool xfd_msrs_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD);
+}
+
+static const VMStateDescription vmstate_msr_xfd = {
+.name = "cpu/msr_xfd",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xfd_msrs_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.msr_xfd, X86CPU),
+VMSTATE_UINT64(env.msr_xfd_err, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool amx_xtile_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE);
+}
+
+static const VMStateDescription vmstate_amx_xtile = {
+.name = "cpu/intel_amx_xtile",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = amx_xtile_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8_ARRAY(env.xtilecfg, X86CPU, 64),
+VMSTATE_UINT8_ARRAY(env.xtiledata, X86CPU, 8192),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -1622,6 +1662,8 @@ const VMStateDescription vmstate_x86_cpu = {
 &vmstate_msr_tsx_ctrl,
 &vmstate_msr_intel_sgx,
 &vmstate_pdptrs,
+&vmstate_msr_xfd,
+&vmstate_amx_xtile,
 NULL
 }
 };

[PATCH v2 6/8] x86: add support for KVM_CAP_XSAVE2 and AMX state migration

2022-02-16 Thread Yang Zhong

From: Jing Liu 

When dynamic xfeatures (e.g. AMX) are used by the guest, the xsave
area would be larger than 4KB. KVM_GET_XSAVE2 and KVM_SET_XSAVE
under KVM_CAP_XSAVE2 works with a xsave buffer larger than 4KB.
Always use the new ioctls under KVM_CAP_XSAVE2 when KVM supports it.

Signed-off-by: Jing Liu 
Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h  |  4 
 target/i386/kvm/kvm.c  | 42 --
 target/i386/xsave_helper.c | 33 ++
 3 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f7fc2e97a6..de9da38e42 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1528,6 +1528,10 @@ typedef struct CPUX86State {
 uint64_t opmask_regs[NB_OPMASK_REGS];
 YMMReg zmmh_regs[CPU_NB_REGS];
 ZMMReg hi16_zmm_regs[CPU_NB_REGS];
+#ifdef TARGET_X86_64
+uint8_t xtilecfg[64];
+uint8_t xtiledata[8192];
+#endif
 
 /* sysenter registers */
 uint32_t sysenter_cs;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8562d3d138..ff064e3d8f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -122,6 +122,7 @@ static uint32_t num_architectural_pmu_gp_counters;
 static uint32_t num_architectural_pmu_fixed_counters;
 
 static int has_xsave;
+static int has_xsave2;
 static int has_xcrs;
 static int has_pit_state2;
 static int has_sregs2;
@@ -1585,6 +1586,26 @@ static Error *invtsc_mig_blocker;
 
 #define KVM_MAX_CPUID_ENTRIES  100
 
+static void kvm_init_xsave(CPUX86State *env)
+{
+if (has_xsave2) {
+env->xsave_buf_len = QEMU_ALIGN_UP(has_xsave2, 4096);
+} else if (has_xsave) {
+env->xsave_buf_len = sizeof(struct kvm_xsave);
+} else {
+return;
+}
+
+env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
+memset(env->xsave_buf, 0, env->xsave_buf_len);
+ /*
+  * The allocated storage must be large enough for all of the
+  * possible XSAVE state components.
+  */
+assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) <=
+   env->xsave_buf_len);
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct {
@@ -1614,6 +1635,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 cpuid_i = 0;
 
+has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+
 r = kvm_arch_set_tsc_khz(cs);
 if (r < 0) {
 return r;
@@ -2003,19 +2026,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (r) {
 goto fail;
 }
-
-if (has_xsave) {
-env->xsave_buf_len = sizeof(struct kvm_xsave);
-env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
-memset(env->xsave_buf, 0, env->xsave_buf_len);
-
-/*
- * The allocated storage must be large enough for all of the
- * possible XSAVE state components.
- */
-assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX)
-   <= env->xsave_buf_len);
-}
+kvm_init_xsave(env);
 
 max_nested_state_len = kvm_max_nested_state_length();
 if (max_nested_state_len > 0) {
@@ -3319,13 +3330,14 @@ static int kvm_get_xsave(X86CPU *cpu)
 {
 CPUX86State *env = &cpu->env;
 void *xsave = env->xsave_buf;
-int ret;
+int type, ret;
 
 if (!has_xsave) {
 return kvm_get_fpu(cpu);
 }
 
-ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+type = has_xsave2 ? KVM_GET_XSAVE2 : KVM_GET_XSAVE;
+ret = kvm_vcpu_ioctl(CPU(cpu), type, xsave);
 if (ret < 0) {
 return ret;
 }
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index ac61a96344..b6a004505f 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -5,6 +5,7 @@
 #include "qemu/osdep.h"
 
 #include "cpu.h"
+#include 
 
 void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
 {
@@ -126,6 +127,22 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, 
uint32_t buflen)
 
 memcpy(pkru, &env->pkru, sizeof(env->pkru));
 }
+
+e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
+if (e->size && e->offset) {
+XSaveXTILECFG *tilecfg = buf + e->offset;
+
+memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
+}
+
+if (buflen > sizeof(struct kvm_xsave)) {
+e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
+if (e->size && e->offset && buflen >= e->size + e->offset) {
+XSaveXTILEDATA *tiledata = buf + e->offset;
+
+memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
+}
+}
 #endif
 }
 
@@ -247,5 +264,21 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void 
*buf, uint32_t buflen)
 pkru = buf + e->offset;
 memcpy(&env->pkru, pkru, sizeof(env->pkru));

[PATCH v2 2/8] x86: Add AMX XTILECFG and XTILEDATA components

2022-02-16 Thread Yang Zhong

From: Jing Liu 

The AMX TILECFG register and the TMMx tile data registers are
saved/restored via XSAVE, respectively in state component 17
(64 bytes) and state component 18 (8192 bytes).

Add AMX feature bits to x86_ext_save_areas array to set
up AMX components. Add structs that define the layout of
AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
structs sizes.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 18 +-
 target/i386/cpu.c |  8 
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index de1dc124ab..06d2d6bccf 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -537,6 +537,8 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT6
 #define XSTATE_Hi16_ZMM_BIT 7
 #define XSTATE_PKRU_BIT 9
+#define XSTATE_XTILE_CFG_BIT17
+#define XSTATE_XTILE_DATA_BIT   18
 
 #define XSTATE_FP_MASK  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK (1ULL << XSTATE_SSE_BIT)
@@ -845,6 +847,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_TSX_LDTRK (1U << 16)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16   (1U << 23)
+/* AMX tile (two-dimensional register) */
+#define CPUID_7_0_EDX_AMX_TILE  (1U << 24)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26)
 /* Single Thread Indirect Branch Predictors */
@@ -1348,6 +1352,16 @@ typedef struct XSavePKRU {
 uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 17: AMX XTILECFG state */
+typedef struct XSaveXTILECFG {
+uint8_t xtilecfg[64];
+} XSaveXTILECFG;
+
+/* Ext. save area 18: AMX XTILEDATA state */
+typedef struct XSaveXTILEDATA {
+uint8_t xtiledata[8][1024];
+} XSaveXTILEDATA;
+
 QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
@@ -1355,6 +1369,8 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveOpmask) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveZMM_Hi256) != 0x200);
 QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
 QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILECFG) != 0x40);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILEDATA) != 0x2000);
 
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
@@ -1362,7 +1378,7 @@ typedef struct ExtSaveArea {
 uint32_t ecx;
 } ExtSaveArea;
 
-#define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
+#define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
 
 extern ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT];
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 37f06b0b1a..ea7e8f9081 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1401,6 +1401,14 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = 
{
 [XSTATE_PKRU_BIT] =
   { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
 .size = sizeof(XSavePKRU) },
+[XSTATE_XTILE_CFG_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILECFG),
+},
+[XSTATE_XTILE_DATA_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILEDATA)
+},
 };
 
 static uint32_t xsave_area_size(uint64_t mask)

[PATCH v2 5/8] x86: Add AMX CPUIDs enumeration

2022-02-16 Thread Yang Zhong

From: Jing Liu 

Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.c | 55 ---
 target/i386/kvm/kvm.c |  4 +++-
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5a7ee8c7e1..2465bed5df 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -576,6 +576,18 @@ static CPUCacheInfo legacy_l3_cache = {
 #define INTEL_PT_CYCLE_BITMAP0x1fff /* Support 0,2^(0~11) */
 #define INTEL_PT_PSB_BITMAP  (0x003f << 16) /* Support 
2K,4K,8K,16K,32K,64K */
 
+/* CPUID Leaf 0x1D constants: */
+#define INTEL_AMX_TILE_MAX_SUBLEAF 0x1
+#define INTEL_AMX_TOTAL_TILE_BYTES 0x2000
+#define INTEL_AMX_BYTES_PER_TILE   0x400
+#define INTEL_AMX_BYTES_PER_ROW0x40
+#define INTEL_AMX_TILE_MAX_NAMES   0x8
+#define INTEL_AMX_TILE_MAX_ROWS0x10
+
+/* CPUID Leaf 0x1E constants: */
+#define INTEL_AMX_TMUL_MAX_K   0x10
+#define INTEL_AMX_TMUL_MAX_N   0x40
+
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
   uint32_t vendor2, uint32_t vendor3)
 {
@@ -845,8 +857,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "avx512-vp2intersect", NULL, "md-clear", NULL,
 NULL, NULL, "serialize", NULL,
 "tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
-NULL, NULL, NULL, "avx512-fp16",
-NULL, NULL, "spec-ctrl", "stibp",
+NULL, NULL, "amx-bf16", "avx512-fp16",
+"amx-tile", "amx-int8", "spec-ctrl", "stibp",
 NULL, "arch-capabilities", "core-capability", "ssbd",
 },
 .cpuid = {
@@ -911,7 +923,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 "xsaveopt", "xsavec", "xgetbv1", "xsaves",
-NULL, NULL, NULL, NULL,
+"xfd", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -5587,6 +5599,43 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 break;
 }
+case 0x1D: {
+/* AMX TILE */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*eax = INTEL_AMX_TILE_MAX_SUBLEAF;
+} else if (count == 1) {
+*eax = INTEL_AMX_TOTAL_TILE_BYTES |
+   (INTEL_AMX_BYTES_PER_TILE << 16);
+*ebx = INTEL_AMX_BYTES_PER_ROW | (INTEL_AMX_TILE_MAX_NAMES << 16);
+*ecx = INTEL_AMX_TILE_MAX_ROWS;
+}
+break;
+}
+case 0x1E: {
+/* AMX TMUL */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*ebx = INTEL_AMX_TMUL_MAX_K | (INTEL_AMX_TMUL_MAX_N << 8);
+}
+break;
+}
 case 0x4000:
 /*
  * CPUID code in kvm_arch_init_vcpu() ignores stuff
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3bdcd724c4..8562d3d138 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1779,7 +1779,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c = &cpuid_data.entries[cpuid_i++];
 }
 break;
-case 0x14: {
+case 0x14:
+case 0x1d:
+case 0x1e: {
 uint32_t times;
 
 c->function = i;

[PATCH v2 4/8] x86: Add XFD faulting bit for state components

2022-02-16 Thread Yang Zhong

From: Jing Liu 

Intel introduces XFD faulting mechanism for extended
XSAVE features to dynamically enable the features in
runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
as 1, it indicates support for XFD faulting of this
state component.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 2 ++
 target/i386/cpu.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4ad0f56bd..f7fc2e97a6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -558,8 +558,10 @@ typedef enum X86Seg {
 #define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
 
 #define ESA_FEATURE_ALIGN64_BIT 1
+#define ESA_FEATURE_XFD_BIT 2
 
 #define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
+#define ESA_FEATURE_XFD_MASK(1U << ESA_FEATURE_XFD_BIT)
 
 
 /* CPUID feature words */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 377d993438..5a7ee8c7e1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5497,7 +5497,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
-*ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
+*ecx = (esa->ecx & ESA_FEATURE_ALIGN64_MASK) |
+   (esa->ecx & ESA_FEATURE_XFD_MASK);
 }
 }
 break;

[PATCH v2 1/8] x86: Fix the 64-byte boundary enumeration for extended state

2022-02-16 Thread Yang Zhong

From: Jing Liu 

The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
indicate whether the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

Right now, they are all zero because no supported component
needed the bit to be set, but the upcoming AMX feature will
use it.  Fix the subleaves value according to KVM's supported
cpuid.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 6 ++
 target/i386/cpu.c | 1 +
 target/i386/kvm/kvm-cpu.c | 1 +
 3 files changed, 8 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9911d7c871..de1dc124ab 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -548,6 +548,11 @@ typedef enum X86Seg {
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
 
+#define ESA_FEATURE_ALIGN64_BIT 1
+
+#define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
+
+
 /* CPUID feature words */
 typedef enum FeatureWord {
 FEAT_1_EDX, /* CPUID[1].EDX */
@@ -1354,6 +1359,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
 uint32_t offset, size;
+uint32_t ecx;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index aa9e636800..37f06b0b1a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5487,6 +5487,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
+*ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
 }
 }
 break;
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index d95028018e..ce27d3b1df 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -104,6 +104,7 @@ static void kvm_cpu_xsave_init(void)
 if (sz != 0) {
 assert(esa->size == sz);
 esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
+esa->ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
 }
 }
 }

[PATCH v2 0/8] AMX support in Qemu

2022-02-16 Thread Yang Zhong

Intel introduces Advanced Matrix Extensions (AMX) [1] feature that
consists of configurable two-dimensional "TILE" registers and new
accelerator instructions that operate on them. TMUL (Tile matrix
MULtiply) is the first accelerator instruction set to use the new
registers.

Since AMX KVM patches have been merged into Linux release, this series
is based on latest Linux release(5.17-rc4).

According to the KVM design, the userspace VMM (e.g. Qemu) is expected
to request guest permission for the dynamically-enabled XSAVE features
only once when the first vCPU is created, and KVM checks guest permission
in KVM_SET_CPUID2.

Intel AMX is XSAVE supported and XSAVE enabled. Those extended features
has large state while current kvm_xsave only allows 4KB. The AMX KVM has
extended struct kvm_xsave to meet this requirenment and added one extra
KVM_GET_XSAVE2 ioctl to handle extended features. From our test, the AMX
live migration work well.

Notice: This version still includes some definitions in the linux-headers,
once Qemu sync those linux-headers, I will remove those definitions. So
please ignore those changes.

[1] Intel Architecture Instruction Set Extension Programming Reference
https://software.intel.com/content/dam/develop/external/us/en/documents/\
architecture-instruction-set-extensions-programming-reference.pdf

Thanks,
Yang


Change history
--
v1->v2:
   - Patch 1 moved "esa->ecx" into the "if{}"(Paolo).
   - Patch 3, the requiremnets from Paoalo,
 - Moved "esa->ecx" into the "if{}".
 - Used the "mask" as parameter to replace xtiledata bits in
   kvm_request_xsave_components()
 - Used the new defined KVM_X86_XCOMP_GUEST_SUPP from KVM to get
   supported_xcr0 from kvm_arch_get_supported_cpuid().
 - Updated the kvm_request_xsave_components() for future usage.
   - Patch 5 added "case 0x1e:" in kvm_arch_init_vcpu()(Paolo).
   - Patch 6 replaced "if (e->size && e->offset)" with 
 "if (e->size && e->offset && buflen >= e->size + e->offset)"
 for xsave and xrstor(Paolo).
   - Patch 8, which is new added patch and is only for linux-headers.
 This patch can be directly dropped once Qemu sync linux-headers. 

rfc v1->v1:
   - Patch 1 changed commit message(Kevin and Paolo).
   - Patch 2 changed commit message(Kevin and Paolo).
   - Patch 3, below requirements from Paolo,
 - Called ARCH_REQ_XCOMP_GUEST_PERM from x86_cpu_enable_xsave_components.
   Used kvm_request_xsave_components() to replace x86_xsave_req_perm().
   Replaced syscall(ARCH_GET_XCOMP_GUEST_PERM) with 
kvm_arch_get_supported_cpuid()
   in kvm_request_xsave_components().
 - Changed kvm_cpu_xsave_init() to use host_cpuid() instead of
   kvm_arch_get_supported_cpuid().
 - Added the "function == 0xd" handle in kvm_arch_get_supported_cpuid().
   - Patch 4, used "uint32_t ecx" to replace "uint32_t need_align, support_xfd".
   - Patch 6, below changes,
 - Changed the commit message(Kevin) and Used the new function
 - kvm_init_xsave() to replace some pieces of code(Wei).
 - Moved KVM_CAP_XSAVE2 extension check to kvm_arch_init_vcpu() to
   make the request permission before KVM_CAP_XSAVE2 extension check(Paolo).
   - Removed RFC prefix.

Jing Liu (5):
  x86: Fix the 64-byte boundary enumeration for extended state
  x86: Add AMX XTILECFG and XTILEDATA components
  x86: Add XFD faulting bit for state components
  x86: Add AMX CPUIDs enumeration
  x86: add support for KVM_CAP_XSAVE2 and AMX state migration

Yang Zhong (2):
  x86: Grant AMX permission for guest
  linux-header: Sync the linux headers

Zeng Guang (1):
  x86: Support XFD and AMX xsave data migration

 linux-headers/asm-x86/kvm.h |  17 ++
 linux-headers/linux/kvm.h   |   4 ++
 target/i386/cpu.h   |  46 ++-
 target/i386/cpu.c   | 108 +++-
 target/i386/kvm/kvm-cpu.c   |  11 ++--
 target/i386/kvm/kvm.c   |  84 ++--
 target/i386/machine.c   |  42 ++
 target/i386/xsave_helper.c  |  33 +++
 8 files changed, 320 insertions(+), 25 deletions(-)

[PATCH v2 3/8] x86: Grant AMX permission for guest

2022-02-16 Thread Yang Zhong

Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).

There are separate permissions for native tasks and guests.

Qemu should request the guest permissions for dynamic xfeatures
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created.

KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
get host side supported_xcr0 and Qemu can decide if it can request
dynamically enabled XSAVE features permission.
https://lore.kernel.org/all/20220126152210.3044876-1-pbonz...@redhat.com/

Suggested-by: Paolo Bonzini 
Signed-off-by: Yang Zhong 
Signed-off-by: Jing Liu 
---
 target/i386/cpu.h |  7 +++
 target/i386/cpu.c | 43 +++
 target/i386/kvm/kvm-cpu.c | 12 +--
 target/i386/kvm/kvm.c | 20 ++
 4 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 06d2d6bccf..d4ad0f56bd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -549,6 +549,13 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
+#define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
+#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
+ | XSTATE_XTILE_DATA_MASK)
+
+#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
+#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
 
 #define ESA_FEATURE_ALIGN64_BIT 1
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ea7e8f9081..377d993438 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -43,6 +43,8 @@
 #include "disas/capstone.h"
 #include "cpu-internal.h"
 
+#include 
+
 /* Helpers for building CPUID[2] descriptors: */
 
 struct CPUID2CacheDescriptorInfo {
@@ -6000,12 +6002,47 @@ static void x86_cpu_adjust_feat_level(X86CPU *cpu, 
FeatureWord w)
 }
 }
 
+static void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
+{
+KVMState *s = kvm_state;
+uint64_t bitmask;
+long rc;
+
+if ((mask & XSTATE_XTILE_DATA_MASK) == XSTATE_XTILE_DATA_MASK) {
+bitmask = kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EAX);
+if (!(bitmask & XSTATE_XTILE_DATA_MASK)) {
+warn_report("no amx support from supported_xcr0, "
+"bitmask:0x%lx", bitmask);
+return;
+}
+
+rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
+  XSTATE_XTILE_DATA_BIT);
+if (rc) {
+/*
+ * The older kernel version(<5.15) can't support
+ * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
+ */
+return;
+}
+
+rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
+if (rc) {
+warn_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
+} else if (!(bitmask & XFEATURE_XTILE_MASK)) {
+warn_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
+"and bitmask=0x%lx", bitmask);
+}
+}
+}
+
 /* Calculate XSAVE components based on the configured CPU feature flags */
 static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 {
 CPUX86State *env = &cpu->env;
 int i;
 uint64_t mask;
+static bool request_perm;
 
 if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
 env->features[FEAT_XSAVE_COMP_LO] = 0;
@@ -6021,6 +6058,12 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 }
 }
 
+/* Only request permission for first vcpu */
+if (kvm_enabled() && !request_perm) {
+kvm_request_xsave_components(cpu, mask);
+request_perm = true;
+}
+
 env->features[FEAT_XSAVE_COMP_LO] = mask;
 env->features[FEAT_XSAVE_COMP_HI] = mask >> 32;
 }
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index ce27d3b1df..a35a1bf9fe 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -84,7 +84,7 @@ static void kvm_cpu_max_instance_init(X86CPU *cpu)
 static void kvm_cpu_xsave_init(void)
 {
 static bool first = true;
-KVMState *s = kvm_state;
+uint32_t eax, ebx, ecx, edx;
 int i;
 
 if (!first) {
@@ -100,11 +100,11 @@ static void kvm_cpu_xsave_init(void)
 ExtSaveArea *esa = &x86_ext_save_areas[i];
 
 if (esa->size) {

Re: [PATCH v4 4/4] hw/i386/sgx: Attach SGX-EPC objects to machine

2022-02-13 Thread Yang Zhong

On Mon, Feb 07, 2022 at 09:37:52AM +0100, Igor Mammedov wrote:
> On Sat,  5 Feb 2022 13:45:26 +0100
> Philippe Mathieu-Daudé  wrote:
> 
> > Previously SGX-EPC objects were exposed in the QOM tree at a path
> > 
> >   /machine/unattached/device[nn]
> > 
> > where the 'nn' varies depending on what devices were already created.
> > 
> > With this change the SGX-EPC objects are now at
> > 
> >   /machine/sgx-epc[nn]
> > 
> > where the 'nn' of the first SGX-EPC object is always zero.
> 
> yet again, why it's necessary?
> 
> > 
> > Reported-by: Yang Zhong 
> > Suggested-by: Paolo Bonzini 
> > Reviewed-by: Daniel P. Berrangé 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  hw/i386/sgx.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> > index a2b318dd938..3ab2217ca43 100644
> > --- a/hw/i386/sgx.c
> > +++ b/hw/i386/sgx.c
> > @@ -304,6 +304,8 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
> >  for (list = x86ms->sgx_epc_list; list; list = list->next) {
> >  obj = object_new("sgx-epc");
> >  
> > +object_property_add_child(OBJECT(pcms), "sgx-epc[*]", OBJECT(obj));
> > +
> >  /* set the memdev link with memory backend */
> >  object_property_parse(obj, SGX_EPC_MEMDEV_PROP, 
> > list->value->memdev,
> >&error_fatal);

   Philippe, I verified this patch, which work well. Thanks a lot!

   (qemu) qom-list /machine
   ..
   sgx-epc[2] (child)
   ..
   sgx-epc[0] (child)
   acpi-device (link)
   sgx-epc[1] (child)
   ..

   Yang

Re: [PATCH v4 4/4] hw/i386/sgx: Attach SGX-EPC objects to machine

2022-02-13 Thread Yang Zhong

On Mon, Feb 07, 2022 at 09:37:52AM +0100, Igor Mammedov wrote:
> On Sat,  5 Feb 2022 13:45:26 +0100
> Philippe Mathieu-Daudé  wrote:
> 
> > Previously SGX-EPC objects were exposed in the QOM tree at a path
> > 
> >   /machine/unattached/device[nn]
> > 
> > where the 'nn' varies depending on what devices were already created.
> > 
> > With this change the SGX-EPC objects are now at
> > 
> >   /machine/sgx-epc[nn]
> > 
> > where the 'nn' of the first SGX-EPC object is always zero.
> 
> yet again, why it's necessary?


  Igor, Sorry for delay feedback because of Chinese New Year holiday.

  This series patches are to fix below issues I reported before,
  https://lists.nongnu.org/archive/html/qemu-devel/2021-11/msg05670.html

  Since the /machine/unattached/device[0] is used by vcpu and Libvirt
  use this interface to get unavailable-features list. But in the SGX
  VM, the device[0] will be occupied by virtual sgx epc device, Libvirt
  can't get unavailable-features from this device[0].

  Although patch 2 in this series already fixed "unavailable-features" issue,
  this patch can move sgx virtual device from /machine/unattached/device[nn]
  to /machine/sgx-epc[nn], which seems more clear. Thanks!

  Yang
  

> 
> > 
> > Reported-by: Yang Zhong 
> > Suggested-by: Paolo Bonzini 
> > Reviewed-by: Daniel P. Berrangé 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  hw/i386/sgx.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> > index a2b318dd938..3ab2217ca43 100644
> > --- a/hw/i386/sgx.c
> > +++ b/hw/i386/sgx.c
> > @@ -304,6 +304,8 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
> >  for (list = x86ms->sgx_epc_list; list; list = list->next) {
> >  obj = object_new("sgx-epc");
> >  
> > +object_property_add_child(OBJECT(pcms), "sgx-epc[*]", OBJECT(obj));
> > +
> >  /* set the memdev link with memory backend */
> >  object_property_parse(obj, SGX_EPC_MEMDEV_PROP, 
> > list->value->memdev,
> >&error_fatal);

Re: [PATCH 6/7] x86: add support for KVM_CAP_XSAVE2 and AMX state migration

2022-01-27 Thread Yang Zhong

On Mon, Jan 24, 2022 at 11:15:25AM +0100, Paolo Bonzini wrote:
> On 1/24/22 08:55, Yang Zhong wrote:
> >
> >+if (buflen > sizeof(struct kvm_xsave)) {
> >+e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
> >+
> >+if (e->size && e->offset) {
> >+const XSaveXTILEDATA *tiledata = buf + e->offset;
> >+
> >+memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
> >+}
> >+}
> 
> Slightly nicer:
> 
> e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
> if (e->size && e->offset && buflen >= e->size + e->offset) {
> ...
> }
> 
> Same for xsave.

 Thanks Paolo, new version will change this.

 Yang
> 
> Paolo

Re: [PATCH 5/7] x86: Add AMX CPUIDs enumeration

2022-01-27 Thread Yang Zhong

On Mon, Jan 24, 2022 at 11:13:07AM +0100, Paolo Bonzini wrote:
> On 1/24/22 08:55, Yang Zhong wrote:
> >diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> >index caf1388d8b..25d26a15f8 100644
> >--- a/target/i386/kvm/kvm.c
> >+++ b/target/i386/kvm/kvm.c
> >@@ -1765,7 +1765,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >  c = &cpuid_data.entries[cpuid_i++];
> >  }
> >  break;
> >-case 0x14: {
> >+case 0x14:
> >+case 0x1d: {
> 
> Should this include 0x1e as well?
>
  
  Thanks, we missed this.

  Yang 
 
> Paolo
> 
> >  uint32_t times;
> >  c->function = i;

Re: [PATCH 3/7] x86: Grant AMX permission for guest

2022-01-27 Thread Yang Zhong

On Mon, Jan 24, 2022 at 11:16:36AM +0100, Paolo Bonzini wrote:
> On 1/24/22 08:55, Yang Zhong wrote:
> >Kernel allocates 4K xstate buffer by default. For XSAVE features
> >which require large state component (e.g. AMX), Linux kernel
> >dynamically expands the xstate buffer only after the process has
> >acquired the necessary permissions. Those are called dynamically-
> >enabled XSAVE features (or dynamic xfeatures).
> >
> >There are separate permissions for native tasks and guests.
> >
> >Qemu should request the guest permissions for dynamic xfeatures
> >which will be exposed to the guest. This only needs to be done
> >once before the first vcpu is created.
> >
> >Suggested-by: Paolo Bonzini 
> >Signed-off-by: Yang Zhong 
> >Signed-off-by: Jing Liu 
> >Signed-off-by: Wei Wang 
> >---
> >  target/i386/cpu.h |  7 +++
> >  target/i386/cpu.c | 31 +++
> >  target/i386/kvm/kvm-cpu.c | 12 ++--
> >  target/i386/kvm/kvm.c |  6 ++
> >  4 files changed, 50 insertions(+), 6 deletions(-)
> >
> >diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >index 06d2d6bccf..d4ad0f56bd 100644
> >--- a/target/i386/cpu.h
> >+++ b/target/i386/cpu.h
> >@@ -549,6 +549,13 @@ typedef enum X86Seg {
> >  #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
> >  #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
> >  #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
> >+#define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
> >+#define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
> >+#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
> >+ | XSTATE_XTILE_DATA_MASK)
> >+
> >+#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
> >+#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
> >  #define ESA_FEATURE_ALIGN64_BIT 1
> >diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >index 3390820745..29b0348c25 100644
> >--- a/target/i386/cpu.c
> >+++ b/target/i386/cpu.c
> >@@ -43,6 +43,10 @@
> >  #include "disas/capstone.h"
> >  #include "cpu-internal.h"
> >+#include 
> >+
> >+bool request_perm;
> >+
> >  /* Helpers for building CPUID[2] descriptors: */
> >  struct CPUID2CacheDescriptorInfo {
> >@@ -6000,6 +6004,27 @@ static void x86_cpu_adjust_feat_level(X86CPU *cpu, 
> >FeatureWord w)
> >  }
> >  }
> >+static void kvm_request_xsave_components(X86CPU *cpu, uint32_t bit)
> >+{
> >+KVMState *s = CPU(cpu)->kvm_state;
> >+
> >+long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
> >+  bit);
> >+if (rc) {
> >+/*
> >+ * The older kernel version(<5.15) can't support
> >+ * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
> >+ */
> >+return;
> >+}
> >+
> >+rc = kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EAX);
> >+if (!(rc & XFEATURE_XTILE_MASK)) {
> >+error_report("get cpuid failure and rc=0x%lx", rc);
> >+exit(EXIT_FAILURE);
> >+}
> >+}
> >+
> >  /* Calculate XSAVE components based on the configured CPU feature flags */
> >  static void x86_cpu_enable_xsave_components(X86CPU *cpu)
> >  {
> >@@ -6021,6 +6046,12 @@ static void x86_cpu_enable_xsave_components(X86CPU 
> >*cpu)
> >  }
> >  }
> >+/* Only request permission from fisrt vcpu. */
> >+if (kvm_enabled() && !request_perm) {
> >+kvm_request_xsave_components(cpu, XSTATE_XTILE_DATA_BIT);
> >+request_perm = true;
> >+}
> 
> You should pass "mask" here, or "mask & XSTATE_DYNAMIC_MASK" so that
> the components are only requested if necessary.

  Thanks, I will pass "mask" here, which can make kvm_request_xsave_components()
  reused in the future.

  Yang 

> 
> >  env->features[FEAT_XSAVE_COMP_LO] = mask;
> >  env->features[FEAT_XSAVE_COMP_HI] = mask >> 32;
> >  }
> >diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> >index 033ca011ea..5ab6a0b9d2 100644
> >--- a/target/i386/kvm/kvm-cpu.c
> >+++ b/target/i386/kvm/kvm-cpu.c
> >@@ -84,7 +84,7 @@ static void kvm_cpu_max_instance_init(X86CPU *cpu)
> >  static void kvm_cpu_xsave_init(void)
> >  {
> >  static bool first = true;
> >-KV

[PATCH 7/7] x86: Support XFD and AMX xsave data migration

2022-01-24 Thread Yang Zhong

From: Zeng Guang 

XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h |  9 +
 target/i386/kvm/kvm.c | 18 ++
 target/i386/machine.c | 42 ++
 3 files changed, 69 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index de9da38e42..509c16323a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -505,6 +505,9 @@ typedef enum X86Seg {
 
 #define MSR_VM_HSAVE_PA 0xc0010117
 
+#define MSR_IA32_XFD0x01c4
+#define MSR_IA32_XFD_ERR0x01c5
+
 #define MSR_IA32_BNDCFGS0x0d90
 #define MSR_IA32_XSS0x0da0
 #define MSR_IA32_UMWAIT_CONTROL 0xe1
@@ -873,6 +876,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_1_EAX_AVX_VNNI  (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16   (1U << 5)
+/* XFD Extend Feature Disabled */
+#define CPUID_D_1_EAX_XFD   (1U << 4)
 
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
@@ -1617,6 +1622,10 @@ typedef struct CPUX86State {
 uint64_t msr_rtit_cr3_match;
 uint64_t msr_rtit_addrs[MAX_RTIT_ADDRS];
 
+/* Per-VCPU XFD MSRs */
+uint64_t msr_xfd;
+uint64_t msr_xfd_err;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5f931fbbc6..8dbda2420d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3260,6 +3260,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_ia32_sgxlepubkeyhash[3]);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD,
+  env->msr_xfd);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR,
+  env->msr_xfd_err);
+}
+
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
 }
@@ -3652,6 +3659,11 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, MSR_IA32_SGXLEPUBKEYHASH3, 0);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -3948,6 +3960,12 @@ static int kvm_get_msrs(X86CPU *cpu)
 env->msr_ia32_sgxlepubkeyhash[index - MSR_IA32_SGXLEPUBKEYHASH0] =
msrs[i].data;
 break;
+case MSR_IA32_XFD:
+env->msr_xfd = msrs[i].data;
+break;
+case MSR_IA32_XFD_ERR:
+env->msr_xfd_err = msrs[i].data;
+break;
 }
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 6202f47793..1f9d0c46f1 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1483,6 +1483,46 @@ static const VMStateDescription vmstate_pdptrs = {
 }
 };
 
+static bool xfd_msrs_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD);
+}
+
+static const VMStateDescription vmstate_msr_xfd = {
+.name = "cpu/msr_xfd",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xfd_msrs_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.msr_xfd, X86CPU),
+VMSTATE_UINT64(env.msr_xfd_err, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool amx_xtile_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE);
+}
+
+static const VMStateDescription vmstate_amx_xtile = {
+.name = "cpu/intel_amx_xtile",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = amx_xtile_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8_ARRAY(env.xtilecfg, X86CPU, 64),
+VMSTATE_UINT8_ARRAY(env.xtiledata, X86CPU, 8192),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -1622,6 +1662,8 @@ const VMStateDescription vmstate_x86_cpu = {
 &vmstate_msr_tsx_ctrl,
 &vmstate_msr_intel_sgx,
 &vmstate_pdptrs,
+&vmstate_msr_xfd,
+&vmstate_amx_xtile,
 NULL
 }
 };

[PATCH 5/7] x86: Add AMX CPUIDs enumeration

2022-01-24 Thread Yang Zhong

From: Jing Liu 

Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.c | 55 ---
 target/i386/kvm/kvm.c |  3 ++-
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ea13be0a19..9543762e7e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -578,6 +578,18 @@ static CPUCacheInfo legacy_l3_cache = {
 #define INTEL_PT_CYCLE_BITMAP0x1fff /* Support 0,2^(0~11) */
 #define INTEL_PT_PSB_BITMAP  (0x003f << 16) /* Support 
2K,4K,8K,16K,32K,64K */
 
+/* CPUID Leaf 0x1D constants: */
+#define INTEL_AMX_TILE_MAX_SUBLEAF 0x1
+#define INTEL_AMX_TOTAL_TILE_BYTES 0x2000
+#define INTEL_AMX_BYTES_PER_TILE   0x400
+#define INTEL_AMX_BYTES_PER_ROW0x40
+#define INTEL_AMX_TILE_MAX_NAMES   0x8
+#define INTEL_AMX_TILE_MAX_ROWS0x10
+
+/* CPUID Leaf 0x1E constants: */
+#define INTEL_AMX_TMUL_MAX_K   0x10
+#define INTEL_AMX_TMUL_MAX_N   0x40
+
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
   uint32_t vendor2, uint32_t vendor3)
 {
@@ -847,8 +859,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "avx512-vp2intersect", NULL, "md-clear", NULL,
 NULL, NULL, "serialize", NULL,
 "tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
-NULL, NULL, NULL, "avx512-fp16",
-NULL, NULL, "spec-ctrl", "stibp",
+NULL, NULL, "amx-bf16", "avx512-fp16",
+"amx-tile", "amx-int8", "spec-ctrl", "stibp",
 NULL, "arch-capabilities", "core-capability", "ssbd",
 },
 .cpuid = {
@@ -913,7 +925,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 "xsaveopt", "xsavec", "xgetbv1", "xsaves",
-NULL, NULL, NULL, NULL,
+"xfd", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -5589,6 +5601,43 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 break;
 }
+case 0x1D: {
+/* AMX TILE */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*eax = INTEL_AMX_TILE_MAX_SUBLEAF;
+} else if (count == 1) {
+*eax = INTEL_AMX_TOTAL_TILE_BYTES |
+   (INTEL_AMX_BYTES_PER_TILE << 16);
+*ebx = INTEL_AMX_BYTES_PER_ROW | (INTEL_AMX_TILE_MAX_NAMES << 16);
+*ecx = INTEL_AMX_TILE_MAX_ROWS;
+}
+break;
+}
+case 0x1E: {
+/* AMX TMUL */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*ebx = INTEL_AMX_TMUL_MAX_K | (INTEL_AMX_TMUL_MAX_N << 8);
+}
+break;
+}
 case 0x4000:
 /*
  * CPUID code in kvm_arch_init_vcpu() ignores stuff
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index caf1388d8b..25d26a15f8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1765,7 +1765,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c = &cpuid_data.entries[cpuid_i++];
 }
 break;
-case 0x14: {
+case 0x14:
+case 0x1d: {
 uint32_t times;
 
 c->function = i;

[PATCH 6/7] x86: add support for KVM_CAP_XSAVE2 and AMX state migration

2022-01-24 Thread Yang Zhong

From: Jing Liu 

When dynamic xfeatures (e.g. AMX) are used by the guest, the xsave
area would be larger than 4KB. KVM_GET_XSAVE2 and KVM_SET_XSAVE
under KVM_CAP_XSAVE2 works with a xsave buffer larger than 4KB.
Always use the new ioctls under KVM_CAP_XSAVE2 when KVM supports it.

Signed-off-by: Jing Liu 
Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
---
 linux-headers/asm-x86/kvm.h | 14 +
 linux-headers/linux/kvm.h   |  2 ++
 target/i386/cpu.h   |  4 
 target/i386/kvm/kvm.c   | 42 -
 target/i386/xsave_helper.c  | 35 +++
 5 files changed, 82 insertions(+), 15 deletions(-)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 5a776a08f7..2e37b825cd 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -375,7 +375,21 @@ struct kvm_debugregs {
 
 /* for KVM_CAP_XSAVE */
 struct kvm_xsave {
+   /*
+* KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many bytes
+* as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+* respectively, when invoked on the vm file descriptor.
+*
+* The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+* will always be at least 4096. Currently, it is only greater
+* than 4096 if a dynamic feature has been enabled with
+* ``arch_prctl()``, but this may change in the future.
+*
+* The offsets of the state save areas in struct kvm_xsave follow
+* the contents of CPUID leaf 0xD on the host.
+*/
__u32 region[1024];
+   __u32 extra[0];
 };
 
 #define KVM_MAX_XCRS   16
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 02c5e7b7bb..af67be1b9e 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_XSAVE2  208
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_XSAVE */
 #define KVM_GET_XSAVE_IOR(KVMIO,  0xa4, struct kvm_xsave)
 #define KVM_SET_XSAVE_IOW(KVMIO,  0xa5, struct kvm_xsave)
+#define KVM_GET_XSAVE2   _IOR(KVMIO,  0xcf, struct kvm_xsave)
 /* Available with KVM_CAP_XCRS */
 #define KVM_GET_XCRS _IOR(KVMIO,  0xa6, struct kvm_xcrs)
 #define KVM_SET_XCRS _IOW(KVMIO,  0xa7, struct kvm_xcrs)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f7fc2e97a6..de9da38e42 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1528,6 +1528,10 @@ typedef struct CPUX86State {
 uint64_t opmask_regs[NB_OPMASK_REGS];
 YMMReg zmmh_regs[CPU_NB_REGS];
 ZMMReg hi16_zmm_regs[CPU_NB_REGS];
+#ifdef TARGET_X86_64
+uint8_t xtilecfg[64];
+uint8_t xtiledata[8192];
+#endif
 
 /* sysenter registers */
 uint32_t sysenter_cs;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 25d26a15f8..5f931fbbc6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -122,6 +122,7 @@ static uint32_t num_architectural_pmu_gp_counters;
 static uint32_t num_architectural_pmu_fixed_counters;
 
 static int has_xsave;
+static int has_xsave2;
 static int has_xcrs;
 static int has_pit_state2;
 static int has_sregs2;
@@ -1571,6 +1572,26 @@ static Error *invtsc_mig_blocker;
 
 #define KVM_MAX_CPUID_ENTRIES  100
 
+static void kvm_init_xsave(CPUX86State *env)
+{
+if (has_xsave2) {
+env->xsave_buf_len = QEMU_ALIGN_UP(has_xsave2, 4096);
+} else if (has_xsave) {
+env->xsave_buf_len = sizeof(struct kvm_xsave);
+} else {
+return;
+}
+
+env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
+memset(env->xsave_buf, 0, env->xsave_buf_len);
+ /*
+  * The allocated storage must be large enough for all of the
+  * possible XSAVE state components.
+  */
+assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) <=
+   env->xsave_buf_len);
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct {
@@ -1600,6 +1621,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 cpuid_i = 0;
 
+has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+
 r = kvm_arch_set_tsc_khz(cs);
 if (r < 0) {
 return r;
@@ -1988,19 +2011,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (r) {
 goto fail;
 }
-
-if (has_xsave) {
-env->xsave_buf_len = sizeof(struct kvm_xsave);
-env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
-memset(env->xsave_buf, 0, env->xsave_buf_len);
-
-/*
- * The allocated storage must be large enough for all of the
- * possible XSAVE state components.
- */
-assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX)
-   <=

[PATCH 4/7] x86: Add XFD faulting bit for state components

2022-01-24 Thread Yang Zhong

From: Jing Liu 

Intel introduces XFD faulting mechanism for extended
XSAVE features to dynamically enable the features in
runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
as 1, it indicates support for XFD faulting of this
state component.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 2 ++
 target/i386/cpu.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4ad0f56bd..f7fc2e97a6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -558,8 +558,10 @@ typedef enum X86Seg {
 #define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
 
 #define ESA_FEATURE_ALIGN64_BIT 1
+#define ESA_FEATURE_XFD_BIT 2
 
 #define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
+#define ESA_FEATURE_XFD_MASK(1U << ESA_FEATURE_XFD_BIT)
 
 
 /* CPUID feature words */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 29b0348c25..ea13be0a19 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5499,7 +5499,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
-*ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
+*ecx = (esa->ecx & ESA_FEATURE_ALIGN64_MASK) |
+   (esa->ecx & ESA_FEATURE_XFD_MASK);
 }
 }
 break;

[PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components

2022-01-24 Thread Yang Zhong

From: Jing Liu 

The AMX TILECFG register and the TMMx tile data registers are
saved/restored via XSAVE, respectively in state component 17
(64 bytes) and state component 18 (8192 bytes).

Add AMX feature bits to x86_ext_save_areas array to set
up AMX components. Add structs that define the layout of
AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
structs sizes.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 18 +-
 target/i386/cpu.c |  8 
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index de1dc124ab..06d2d6bccf 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -537,6 +537,8 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT6
 #define XSTATE_Hi16_ZMM_BIT 7
 #define XSTATE_PKRU_BIT 9
+#define XSTATE_XTILE_CFG_BIT17
+#define XSTATE_XTILE_DATA_BIT   18
 
 #define XSTATE_FP_MASK  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK (1ULL << XSTATE_SSE_BIT)
@@ -845,6 +847,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_TSX_LDTRK (1U << 16)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16   (1U << 23)
+/* AMX tile (two-dimensional register) */
+#define CPUID_7_0_EDX_AMX_TILE  (1U << 24)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26)
 /* Single Thread Indirect Branch Predictors */
@@ -1348,6 +1352,16 @@ typedef struct XSavePKRU {
 uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 17: AMX XTILECFG state */
+typedef struct XSaveXTILECFG {
+uint8_t xtilecfg[64];
+} XSaveXTILECFG;
+
+/* Ext. save area 18: AMX XTILEDATA state */
+typedef struct XSaveXTILEDATA {
+uint8_t xtiledata[8][1024];
+} XSaveXTILEDATA;
+
 QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
@@ -1355,6 +1369,8 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveOpmask) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveZMM_Hi256) != 0x200);
 QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
 QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILECFG) != 0x40);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILEDATA) != 0x2000);
 
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
@@ -1362,7 +1378,7 @@ typedef struct ExtSaveArea {
 uint32_t ecx;
 } ExtSaveArea;
 
-#define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
+#define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
 
 extern ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT];
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 37f06b0b1a..3390820745 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1401,6 +1401,14 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = 
{
 [XSTATE_PKRU_BIT] =
   { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
 .size = sizeof(XSavePKRU) },
+[XSTATE_XTILE_CFG_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILECFG),
+},
+[XSTATE_XTILE_DATA_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILEDATA),
+},
 };
 
 static uint32_t xsave_area_size(uint64_t mask)

[PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state

2022-01-24 Thread Yang Zhong

From: Jing Liu 

The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
indicate whether the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

Right now, they are all zero because no supported component
needed the bit to be set, but the upcoming AMX feature will
use it.  Fix the subleaves value according to KVM's supported
cpuid.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 6 ++
 target/i386/cpu.c | 1 +
 target/i386/kvm/kvm-cpu.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9911d7c871..de1dc124ab 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -548,6 +548,11 @@ typedef enum X86Seg {
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
 
+#define ESA_FEATURE_ALIGN64_BIT 1
+
+#define ESA_FEATURE_ALIGN64_MASK(1U << ESA_FEATURE_ALIGN64_BIT)
+
+
 /* CPUID feature words */
 typedef enum FeatureWord {
 FEAT_1_EDX, /* CPUID[1].EDX */
@@ -1354,6 +1359,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
 uint32_t offset, size;
+uint32_t ecx;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index aa9e636800..37f06b0b1a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5487,6 +5487,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
+*ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
 }
 }
 break;
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index d95028018e..033ca011ea 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -105,6 +105,8 @@ static void kvm_cpu_xsave_init(void)
 assert(esa->size == sz);
 esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
 }
+
+esa->ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
 }
 }
 }

[PATCH 3/7] x86: Grant AMX permission for guest

2022-01-24 Thread Yang Zhong

Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).

There are separate permissions for native tasks and guests.

Qemu should request the guest permissions for dynamic xfeatures
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created.

Suggested-by: Paolo Bonzini 
Signed-off-by: Yang Zhong 
Signed-off-by: Jing Liu 
Signed-off-by: Wei Wang 
---
 target/i386/cpu.h |  7 +++
 target/i386/cpu.c | 31 +++
 target/i386/kvm/kvm-cpu.c | 12 ++--
 target/i386/kvm/kvm.c |  6 ++
 4 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 06d2d6bccf..d4ad0f56bd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -549,6 +549,13 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
+#define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
+#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
+ | XSTATE_XTILE_DATA_MASK)
+
+#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
+#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
 
 #define ESA_FEATURE_ALIGN64_BIT 1
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3390820745..29b0348c25 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -43,6 +43,10 @@
 #include "disas/capstone.h"
 #include "cpu-internal.h"
 
+#include 
+
+bool request_perm;
+
 /* Helpers for building CPUID[2] descriptors: */
 
 struct CPUID2CacheDescriptorInfo {
@@ -6000,6 +6004,27 @@ static void x86_cpu_adjust_feat_level(X86CPU *cpu, 
FeatureWord w)
 }
 }
 
+static void kvm_request_xsave_components(X86CPU *cpu, uint32_t bit)
+{
+KVMState *s = CPU(cpu)->kvm_state;
+
+long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
+  bit);
+if (rc) {
+/*
+ * The older kernel version(<5.15) can't support
+ * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
+ */
+return;
+}
+
+rc = kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EAX);
+if (!(rc & XFEATURE_XTILE_MASK)) {
+error_report("get cpuid failure and rc=0x%lx", rc);
+exit(EXIT_FAILURE);
+}
+}
+
 /* Calculate XSAVE components based on the configured CPU feature flags */
 static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 {
@@ -6021,6 +6046,12 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 }
 }
 
+/* Only request permission from fisrt vcpu. */
+if (kvm_enabled() && !request_perm) {
+kvm_request_xsave_components(cpu, XSTATE_XTILE_DATA_BIT);
+request_perm = true;
+}
+
 env->features[FEAT_XSAVE_COMP_LO] = mask;
 env->features[FEAT_XSAVE_COMP_HI] = mask >> 32;
 }
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 033ca011ea..5ab6a0b9d2 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -84,7 +84,7 @@ static void kvm_cpu_max_instance_init(X86CPU *cpu)
 static void kvm_cpu_xsave_init(void)
 {
 static bool first = true;
-KVMState *s = kvm_state;
+uint32_t eax, ebx, ecx, edx;
 int i;
 
 if (!first) {
@@ -100,13 +100,13 @@ static void kvm_cpu_xsave_init(void)
 ExtSaveArea *esa = &x86_ext_save_areas[i];
 
 if (esa->size) {
-int sz = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX);
-if (sz != 0) {
-assert(esa->size == sz);
-esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
+host_cpuid(0xd, i, &eax, &ebx, &ecx, &edx);
+if (eax != 0) {
+assert(esa->size == eax);
+esa->offset = ebx;
 }
 
-esa->ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
+esa->ecx = ecx;
 }
 }
 }
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 2c8feb4a6f..caf1388d8b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -405,6 +405,12 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 if (!has_msr_arch_capabs) {
 ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
 }
+} else if (function == 0xd && index == 0 && reg == R_EAX) {
+/*
+ * We can set the AMX XTILE DATA flag, even if KVM does n

[PATCH 0/7] AMX support in Qemu

2022-01-24 Thread Yang Zhong

Intel introduces Advanced Matrix Extensions (AMX) [1] feature that
consists of configurable two-dimensional "TILE" registers and new
accelerator instructions that operate on them. TMUL (Tile matrix
MULtiply) is the first accelerator instruction set to use the new
registers.

Since AMX KVM patches have been merged into Linux release, this series
is based on latest Linux release.

According to the KVM design, the userspace VMM (e.g. Qemu) is expected
to request guest permission for the dynamically-enabled XSAVE features
only once when the first vCPU is created, and KVM checks guest permission
in KVM_SET_CPUID2.

Intel AMX is XSAVE supported and XSAVE enabled. Those extended features
has large state while current kvm_xsave only allows 4KB. The AMX KVM has
extended struct kvm_xsave to meet this requirenment and added one extra
KVM_GET_XSAVE2 ioctl to handle extended features. From our test, the AMX
live migration work well.

Notice: This version still includes some definitions in the linux-headers,
once Qemu sync those linux-headers, I will remove those definitions. So
please ignore those changes.

[1] Intel Architecture Instruction Set Extension Programming Reference
https://software.intel.com/content/dam/develop/external/us/en/documents/\
architecture-instruction-set-extensions-programming-reference.pdf

Thanks,
Yang


change history
--
rfc v1->v1:
   - Patch 1 changed commit message(Kevin and Paolo).
   - Patch 2 changed commit message(Kevin and Paolo).
   - Patch 3, below requirements from Paolo,
 - Called ARCH_REQ_XCOMP_GUEST_PERM from x86_cpu_enable_xsave_components.
   Used kvm_request_xsave_components() to replace x86_xsave_req_perm().
   Replaced syscall(ARCH_GET_XCOMP_GUEST_PERM) with 
kvm_arch_get_supported_cpuid()
   in kvm_request_xsave_components().
 - Changed kvm_cpu_xsave_init() to use host_cpuid() instead of
   kvm_arch_get_supported_cpuid().
 - Added the "function == 0xd" handle in kvm_arch_get_supported_cpuid().   
   - Patch 4, used "uint32_t ecx" to replace "uint32_t need_align, support_xfd".
   - Patch 6, below changes,
 - Changed the commit message(Kevin) and Used the new function
 - kvm_init_xsave() to replace some pieces of code(Wei).
 - Moved KVM_CAP_XSAVE2 extension check to kvm_arch_init_vcpu() to
   make the request permission before KVM_CAP_XSAVE2 extension check(Paolo).
   - Removed RFC prefix.

Jing Liu (5):
  x86: Fix the 64-byte boundary enumeration for extended state
  x86: Add AMX XTILECFG and XTILEDATA components
  x86: Add XFD faulting bit for state components
  x86: Add AMX CPUIDs enumeration
  x86: add support for KVM_CAP_XSAVE2 and AMX state migration

Yang Zhong (1):
  x86: Grant AMX permission for guest

Zeng Guang (1):
  x86: Support XFD and AMX xsave data migration

 linux-headers/asm-x86/kvm.h | 14 ++
 linux-headers/linux/kvm.h   |  2 +
 target/i386/cpu.h   | 46 +-
 target/i386/cpu.c   | 96 +++--
 target/i386/kvm/kvm-cpu.c   | 12 +++--
 target/i386/kvm/kvm.c   | 69 +++---
 target/i386/machine.c   | 42 
 target/i386/xsave_helper.c  | 35 ++
 8 files changed, 291 insertions(+), 25 deletions(-)

Re: [RFC PATCH 2/2] hw/i386/sgx: Attach SGX-EPC to its memory backend

2022-01-23 Thread Yang Zhong

On Mon, Jan 17, 2022 at 12:48:10PM +0100, Paolo Bonzini wrote:
> On 1/17/22 00:53, Philippe Mathieu-Daudé via wrote:
> >We have one SGX-EPC address/size/node per memory backend,
> >make it child of the backend in the QOM composition tree.
> >
> >Cc: Yang Zhong 
> >Signed-off-by: Philippe Mathieu-Daudé 
> >---
> >  hw/i386/sgx.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> >diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> >index 5de5dd08936..6362e5e9d02 100644
> >--- a/hw/i386/sgx.c
> >+++ b/hw/i386/sgx.c
> >@@ -300,6 +300,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
> >  /* set the memdev link with memory backend */
> >  object_property_parse(obj, SGX_EPC_MEMDEV_PROP, 
> > list->value->memdev,
> >&error_fatal);
> >+object_property_add_child(OBJECT(list->value->memdev), "sgx-epc",
> >+  OBJECT(obj));
> >+
> >  /* set the numa node property for sgx epc object */
> >  object_property_set_uint(obj, SGX_EPC_NUMA_NODE_PROP, 
> > list->value->node,
> >   &error_fatal);
> 
> I don't think this is a good idea; only list->value->memdev should
> add something below itself in the tree.
> 
> However, I think obj can be added under the machine itself as
> /machine/sgx-epc-device[*].
> 

  Philippe, Sorry I can't receive all Qemu mails from my mutt tool.

  https://lists.nongnu.org/archive/html/qemu-devel/2022-01/msg03535.html
  I verified this patch, and the issue was reported as below:

  Unexpected error in object_property_try_add() at ../qom/object.c:1224:
  qemu-system-x86_64: attempt to add duplicate property 'sgx-epc' to object 
(type 'pc-q35-7.0-machine')
  Aborted (core dumped)

  Even I changed it to another name, which still reported same kind of issue.

  I tried below patch as my previous patch, and it can work
  diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
  index d60485fc422..66444745b47 100644
  --- a/hw/i386/sgx.c
  +++ b/hw/i386/sgx.c
  @@ -281,6 +281,7 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
   SGXEPCState *sgx_epc = &pcms->sgx_epc;
   X86MachineState *x86ms = X86_MACHINE(pcms);
   SgxEPCList *list = NULL;
  +int sgx_count = 0;
   Object *obj;

   memset(sgx_epc, 0, sizeof(SGXEPCState));
  @@ -297,7 +298,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
   for (list = x86ms->sgx_epc_list; list; list = list->next) {
   obj = object_new("sgx-epc");

  -object_property_add_child(OBJECT(pcms), "sgx-epc", OBJECT(obj));
  +gchar *name = g_strdup_printf("device[%d]", sgx_count++);
  +object_property_add_child(container_get(qdev_get_machine(), 
"/sgx-epc-device"),
  +  name, obj);
  
   /* set the memdev link with memory backend */
   object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,


  From the monitor, 
  (qemu) qom-list /machine/sgx-epc-device
  type (string)
  device[0] (child)
  device[1] (child)
  
  This can normally show two sgx epc section devices.
  
  If you have new patch, I can help verify, thanks!

  Yang

> Paolo

Re: [RFC PATCH 3/7] x86: Grant AMX permission for guest

2022-01-21 Thread Yang Zhong

On Tue, Jan 18, 2022 at 02:06:55PM +0100, Paolo Bonzini wrote:
> Sorry, hit send on the wrong window.  This is the only patch that
> will require a bit more work.
> 
> On 1/18/22 13:52, Paolo Bonzini wrote:
> >>@@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms,
> >>int default_cpu_version)
> >>  MachineState *ms = MACHINE(x86ms);
> >>  MachineClass *mc = MACHINE_GET_CLASS(x86ms);
> >>+    /* Request AMX pemission for guest */
> >>+    x86_xsave_req_perm();
> >>  x86_cpu_set_default_version(default_cpu_version);
> >
> >This should be done before creating a CPU with support for state
> >component 18.  It happens in kvm_init_vcpu, with the following
> >call stack:
> >
> > kvm_init_vcpu
> > kvm_vcpu_thread_fn
> > kvm_start_vcpu_thread
> > qemu_init_vcpu
> > x86_cpu_realizefn
> >
> >The issue however is that this has to be done before
> >KVM_GET_SUPPORTED_CPUID and KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
> >
> >For the former, you can assume that anything returned by
> >ARCH_GET_XCOMP_GUEST_PERM will be returned by
> >KVM_GET_SUPPORTED_CPUID in CPUID[0xD].EDX:EAX, so you can:
> >
> >- add it to kvm_arch_get_supported_cpuid
> 
> ... together with the other special cases (otherwise
> x86_cpu_get_supported_feature_word complains that XTILEDATA is not
> available)
> 
> - change kvm_cpu_xsave_init to use host_cpuid instead of
> kvm_arch_get_supported_cpuid.
> 
> - call ARCH_REQ_XCOMP_GUEST_PERM from
> x86_cpu_enable_xsave_components, with a conditional like
> 
> if (kvm_enabled()) {
> kvm_request_xsave_components(cpu, mask);
> }
> 
> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) is actually not a problem; the
> ioctl is only called from kvm_arch_init_vcpu and therefore after
> x86_cpu_enable_xsave_components.
>
  
  Paolo, thanks too much for those detailed steps!
  I have completed the new patch according to those steps, and work well.

  Since this is only big change patch, the next version will be removed RFC.

  Thanks!
  Yang  
  
 
> Thanks,
> 
> Paolo

Re: [RFC PATCH 4/7] x86: Add XFD faulting bit for state components

2022-01-21 Thread Yang Zhong

On Tue, Jan 18, 2022 at 01:52:51PM +0100, Paolo Bonzini wrote:
> On 1/7/22 10:31, Yang Zhong wrote:
> >-uint32_t need_align;
> >+uint32_t need_align, support_xfd;
> 
> These can be replaced by a single field "uint32_t ecx".
> 
> You can add also macros like
> 
> #define ESA_FEATURE_ALIGN64_BIT   (1)
> #define ESA_FEATURE_XFD_BIT   (2)
> 
> to simplify access.
  
  Thanks Paolo, this is a more simplified solution, thanks!

  Yang
 
> Paolo

Re: [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components

2022-01-21 Thread Yang Zhong

On Tue, Jan 18, 2022 at 01:39:59PM +0100, Paolo Bonzini wrote:
> On 1/10/22 09:23, Tian, Kevin wrote:
> >>
> >>AMX XTILECFG and XTILEDATA are managed by XSAVE feature
> >>set. State component 17 is used for 64-byte TILECFG register
> >>(XTILECFG state) and component 18 is used for 8192 bytes
> >>of tile data (XTILEDATA state).
> >to be consistent, "tile data" -> "TILEDATA"
> >
> 
> Previous sentences use "XTILECFG" / "XTILEDATA", not "TILEDATA".
> 
> So I would say:
> 
> The AMX TILECFG register and the TMMx tile data registers are
> saved/restored via XSAVE, respectively in state component 17 (64
> bytes) and state component 18 (8192 bytes).
>

  Thanks Paolo, I will update this in new version.
  Yang
 
> Paolo

Re: [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state

2022-01-21 Thread Yang Zhong

On Tue, Jan 18, 2022 at 01:37:20PM +0100, Paolo Bonzini wrote:
> On 1/11/22 03:22, Yang Zhong wrote:
> >   Thanks Kevin, I will update this in next version.
> 
> Also:
> 
> The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
> indicate whether the extended state component locates
> on the next 64-byte boundary following the preceding state
> component when the compacted format of an XSAVE area is
> used.
> 
> Right now, they are all zero because no supported component
> needed the bit to be set, but the upcoming AMX feature will
> use it.  Fix the subleaves value according to KVM's supported
> cpuid.
>
  Thanks Paolo, I will update this in new version.

  Yang
  
> Paolo

[PATCH v3] qapi: Cleanup SGX related comments and restore @section-size

2022-01-20 Thread Yang Zhong

The SGX NUMA patches were merged into Qemu 7.0 release, we need
clarify detailed version history information and also change
some related comments, which make SGX related comments clearer.

The QMP command schema promises backwards compatibility as standard.
We temporarily restore "@section-size", which can avoid incompatible
API breakage. The "@section-size" will be deprecated in 7.2 version.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Yang Zhong 
Reviewed-by: Daniel P. Berrangé 
---
 docs/about/deprecated.rst | 13 +
 qapi/machine.json |  4 ++--
 qapi/misc-target.json | 22 +-
 hw/i386/sgx.c | 11 +--
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index e21e07478f..47a594a3b6 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -264,6 +264,19 @@ accepted incorrect commands will return an error. Users 
should make sure that
 all arguments passed to ``device_add`` are consistent with the documented
 property types.
 
+``query-sgx`` return value member ``section-size`` (since 7.0)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Member ``section-size`` in return value elements with meta-type ``uint64`` is
+deprecated.  Use ``sections`` instead.
+
+
+``query-sgx-capabilities`` return value member ``section-size`` (since 7.0)
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Member ``section-size`` in return value elements with meta-type ``uint64`` is
+deprecated.  Use ``sections`` instead.
+
 System accelerators
 ---
 
diff --git a/qapi/machine.json b/qapi/machine.json
index b6a37e17c4..cf47cb63a9 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1207,7 +1207,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
@@ -1288,7 +1288,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 1022aa0184..4bc45d2474 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -344,9 +344,9 @@
 #
 # @node: the numa node
 #
-# @size: the size of epc section
+# @size: the size of EPC section
 #
-# Since: 6.2
+# Since: 7.0
 ##
 { 'struct': 'SGXEPCSection',
   'data': { 'node': 'int',
@@ -365,7 +365,13 @@
 #
 # @flc: true if FLC is supported
 #
-# @sections: The EPC sections info for guest
+# @section-size: The EPC section size for guest
+#Redundant with @sections.  Just for backward compatibility.
+#
+# @sections: The EPC sections info for guest (Since: 7.0)
+#
+# Features:
+# @deprecated: Member @section-size is deprecated.  Use @sections instead.
 #
 # Since: 6.2
 ##
@@ -374,6 +380,8 @@
 'sgx1': 'bool',
 'sgx2': 'bool',
 'flc': 'bool',
+'section-size': { 'type': 'uint64',
+'features': [ 'deprecated' ] },
 'sections': ['SGXEPCSection']},
'if': 'TARGET_I386' }
 
@@ -390,7 +398,9 @@
 #
 # -> { "execute": "query-sgx" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#  "flc": true, "section-size" : 0 } }
+#  "flc": true,  "section-size" : 96468992,
+#  "sections": [{"node": 0, "size": 67108864},
+#  {"node": 1, "size": 29360128}]} }
 #
 ##
 { 'command': 'query-sgx', 'returns': 'SGXInfo', 'if': 'TARGET_I386' }
@@ -408,7 +418,9 @@
 #
 # -> { "execute": "query-sgx-capabilities" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#  "flc": true, "section-size" : 0 }

Re: [PATCH v2] qapi: Cleanup SGX related comments and restore @section-size

2022-01-20 Thread Yang Zhong

On Thu, Jan 20, 2022 at 09:44:34AM +, Daniel P. Berrangé wrote:
> On Thu, Jan 20, 2022 at 05:16:01PM +0800, Yang Zhong wrote:
> > On Thu, Jan 20, 2022 at 09:10:34AM +, Daniel P. Berrangé wrote:
> > > On Wed, Jan 19, 2022 at 06:57:20PM -0500, Yang Zhong wrote:
> > > > The SGX NUMA patches were merged into Qemu 7.0 release, we need
> > > > clarify detailed version history information and also change
> > > > some related comments, which make SGX related comments clearer.
> > > > 
> > > > The QMP command schema promises backwards compatibility as standard.
> > > > We temporarily restore "@section-size", which can avoid incompatible
> > > > API breakage. The "@section-size" will be deprecated in 7.2 version.
> > > > 
> > > > Suggested-by: Daniel P. Berrangé 
> > > > Signed-off-by: Yang Zhong 
> > > > Reviewed-by: Daniel P. Berrangé 
> > > > ---
> > > >  qapi/machine.json |  4 ++--
> > > >  qapi/misc-target.json | 17 -
> > > >  hw/i386/sgx.c | 11 +--
> > > >  3 files changed, 23 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/qapi/machine.json b/qapi/machine.json
> > > > index b6a37e17c4..cf47cb63a9 100644
> > > > --- a/qapi/machine.json
> > > > +++ b/qapi/machine.json
> > > > @@ -1207,7 +1207,7 @@
> > > >  #
> > > >  # @memdev: memory backend linked with device
> > > >  #
> > > > -# @node: the numa node
> > > > +# @node: the numa node (Since: 7.0)
> > > >  #
> > > >  # Since: 6.2
> > > >  ##
> > > > @@ -1288,7 +1288,7 @@
> > > >  #
> > > >  # @memdev: memory backend linked with device
> > > >  #
> > > > -# @node: the numa node
> > > > +# @node: the numa node (Since: 7.0)
> > > >  #
> > > >  # Since: 6.2
> > > >  ##
> > > > diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> > > > index 1022aa0184..a87358ea44 100644
> > > > --- a/qapi/misc-target.json
> > > > +++ b/qapi/misc-target.json
> > > > @@ -344,9 +344,9 @@
> > > >  #
> > > >  # @node: the numa node
> > > >  #
> > > > -# @size: the size of epc section
> > > > +# @size: the size of EPC section
> > > >  #
> > > > -# Since: 6.2
> > > > +# Since: 7.0
> > > >  ##
> > > >  { 'struct': 'SGXEPCSection',
> > > >'data': { 'node': 'int',
> > > > @@ -365,7 +365,9 @@
> > > >  #
> > > >  # @flc: true if FLC is supported
> > > >  #
> > > > -# @sections: The EPC sections info for guest
> > > > +# @section-size: The EPC section size for guest (Will be deprecated in 
> > > > 7.2)
> > > 
> > > I expected deprecation would start now (7.0, and it would be removed
> > > in 7.2.
> > > 
> > > Also needs to be documented in docs/about/deprecated.rst
> > > 
> > > 
> >   
> >Thanks Daniel, Please check if below comments are okay or not? If no
> >problem, I will send v3 today, thanks! 
> >  
> >diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> >index e21e07478f..810542427f 100644
> >--- a/docs/about/deprecated.rst
> >+++ b/docs/about/deprecated.rst
> >@@ -441,3 +441,13 @@ nanoMIPS ISA
> > 
> >The ``nanoMIPS`` ISA has never been upstreamed to any compiler toolchain.
> >As it is hard to generate binaries for it, declare it deprecated.
> >+
> >+
> >+SGX backwards compatibility
> >+---
> >+
> >+@section-size (Since 7.0)
> >+''''''''''''''''''''''''
> >+
> >+The ``@section-size`` will be replaced with ``@sections`` struct and 
> > declare
> >+it deprecated.
> 
> This needs to be higher up in the file - look at the section
> with the heading 'QEMU Machine Protocol (QMP) commands' for
> how we list QMP features we're removing.
>   

   Thanks, I will add this in v3.
 
> 
> >diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> >index a87358ea44..c88fd0c2a2 100644
> >--- a/qapi/misc-target.json
> >+++ b/qapi/misc-target.json
> >@@ -365,7 +365,7 @@
> >#
> ># @flc: true if FLC is supported
> >#
> >   -# @section-size: The EPC section size for guest (Will be deprecated in 
> > 7.2)
> >   +# @section-size: The EPC section size for guest (7.0, and it would be 
> > removed in 7.2)
> 
> We don't need this comment - see the other reply in this thread
> about using an '@deprecated' tag instead, which gets turned into
> a comment in the auto-generated documentation. 
> 

  Thanks Daniel, it is strange that my mail couldn't receive this mail. But I 
checked this
  from 
link(https://lists.nongnu.org/archive/html/qemu-devel/2022-01/msg04247.html).

  So here, I also thanks Philippe, who shared one helpful example for me. 
thanks again!

  Yang 


> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2] qapi: Cleanup SGX related comments and restore @section-size

2022-01-20 Thread Yang Zhong

On Thu, Jan 20, 2022 at 09:10:34AM +, Daniel P. Berrangé wrote:
> On Wed, Jan 19, 2022 at 06:57:20PM -0500, Yang Zhong wrote:
> > The SGX NUMA patches were merged into Qemu 7.0 release, we need
> > clarify detailed version history information and also change
> > some related comments, which make SGX related comments clearer.
> > 
> > The QMP command schema promises backwards compatibility as standard.
> > We temporarily restore "@section-size", which can avoid incompatible
> > API breakage. The "@section-size" will be deprecated in 7.2 version.
> > 
> > Suggested-by: Daniel P. Berrangé 
> > Signed-off-by: Yang Zhong 
> > Reviewed-by: Daniel P. Berrangé 
> > ---
> >  qapi/machine.json |  4 ++--
> >  qapi/misc-target.json | 17 -
> >  hw/i386/sgx.c | 11 +--
> >  3 files changed, 23 insertions(+), 9 deletions(-)
> > 
> > diff --git a/qapi/machine.json b/qapi/machine.json
> > index b6a37e17c4..cf47cb63a9 100644
> > --- a/qapi/machine.json
> > +++ b/qapi/machine.json
> > @@ -1207,7 +1207,7 @@
> >  #
> >  # @memdev: memory backend linked with device
> >  #
> > -# @node: the numa node
> > +# @node: the numa node (Since: 7.0)
> >  #
> >  # Since: 6.2
> >  ##
> > @@ -1288,7 +1288,7 @@
> >  #
> >  # @memdev: memory backend linked with device
> >  #
> > -# @node: the numa node
> > +# @node: the numa node (Since: 7.0)
> >  #
> >  # Since: 6.2
> >  ##
> > diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> > index 1022aa0184..a87358ea44 100644
> > --- a/qapi/misc-target.json
> > +++ b/qapi/misc-target.json
> > @@ -344,9 +344,9 @@
> >  #
> >  # @node: the numa node
> >  #
> > -# @size: the size of epc section
> > +# @size: the size of EPC section
> >  #
> > -# Since: 6.2
> > +# Since: 7.0
> >  ##
> >  { 'struct': 'SGXEPCSection',
> >'data': { 'node': 'int',
> > @@ -365,7 +365,9 @@
> >  #
> >  # @flc: true if FLC is supported
> >  #
> > -# @sections: The EPC sections info for guest
> > +# @section-size: The EPC section size for guest (Will be deprecated in 7.2)
> 
> I expected deprecation would start now (7.0, and it would be removed
> in 7.2.
> 
> Also needs to be documented in docs/about/deprecated.rst
> 
> 
  
   Thanks Daniel, Please check if below comments are okay or not? If no
   problem, I will send v3 today, thanks! 
 
   diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
   index e21e07478f..810542427f 100644
   --- a/docs/about/deprecated.rst
   +++ b/docs/about/deprecated.rst
   @@ -441,3 +441,13 @@ nanoMIPS ISA

   The ``nanoMIPS`` ISA has never been upstreamed to any compiler toolchain.
   As it is hard to generate binaries for it, declare it deprecated.
   +
   +
   +SGX backwards compatibility
   +---
   +
   +@section-size (Since 7.0)
   +''''''''''''''''''''''''
   +
   +The ``@section-size`` will be replaced with ``@sections`` struct and declare
   +it deprecated.
   diff --git a/qapi/misc-target.json b/qapi/misc-target.json
   index a87358ea44..c88fd0c2a2 100644
   --- a/qapi/misc-target.json
   +++ b/qapi/misc-target.json
   @@ -365,7 +365,7 @@
   #
   # @flc: true if FLC is supported
   #
  -# @section-size: The EPC section size for guest (Will be deprecated in 7.2)
  +# @section-size: The EPC section size for guest (7.0, and it would be 
removed in 7.2)
   #
   # @sections: The EPC sections info for guest (Since: 7.0) 
   
   Regards,

   Yang


> 
> > +#
> > +# @sections: The EPC sections info for guest (Since: 7.0)
> >  #
> >  # Since: 6.2
> >  ##
> > @@ -374,6 +376,7 @@
> >  'sgx1': 'bool',
> >  'sgx2': 'bool',
> >  'flc': 'bool',
> > +'section-size': 'uint64',
> >  'sections': ['SGXEPCSection']},
> > 'if': 'TARGET_I386' }
> >  
> > @@ -390,7 +393,9 @@
> >  #
> >  # -> { "execute": "query-sgx" }
> >  # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
> > -#  "flc": true, "section-size" : 0 } }
> > +#  "flc": true,  "section-size" : 96468992,
> > +#  "sections":

[PATCH v2] qapi: Cleanup SGX related comments and restore @section-size

2022-01-19 Thread Yang Zhong

The SGX NUMA patches were merged into Qemu 7.0 release, we need
clarify detailed version history information and also change
some related comments, which make SGX related comments clearer.

The QMP command schema promises backwards compatibility as standard.
We temporarily restore "@section-size", which can avoid incompatible
API breakage. The "@section-size" will be deprecated in 7.2 version.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Yang Zhong 
Reviewed-by: Daniel P. Berrangé 
---
 qapi/machine.json |  4 ++--
 qapi/misc-target.json | 17 -
 hw/i386/sgx.c | 11 +--
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index b6a37e17c4..cf47cb63a9 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1207,7 +1207,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
@@ -1288,7 +1288,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 1022aa0184..a87358ea44 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -344,9 +344,9 @@
 #
 # @node: the numa node
 #
-# @size: the size of epc section
+# @size: the size of EPC section
 #
-# Since: 6.2
+# Since: 7.0
 ##
 { 'struct': 'SGXEPCSection',
   'data': { 'node': 'int',
@@ -365,7 +365,9 @@
 #
 # @flc: true if FLC is supported
 #
-# @sections: The EPC sections info for guest
+# @section-size: The EPC section size for guest (Will be deprecated in 7.2)
+#
+# @sections: The EPC sections info for guest (Since: 7.0)
 #
 # Since: 6.2
 ##
@@ -374,6 +376,7 @@
 'sgx1': 'bool',
 'sgx2': 'bool',
 'flc': 'bool',
+'section-size': 'uint64',
 'sections': ['SGXEPCSection']},
'if': 'TARGET_I386' }
 
@@ -390,7 +393,9 @@
 #
 # -> { "execute": "query-sgx" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#  "flc": true, "section-size" : 0 } }
+#  "flc": true,  "section-size" : 96468992,
+#  "sections": [{"node": 0, "size": 67108864},
+#  {"node": 1, "size": 29360128}]} }
 #
 ##
 { 'command': 'query-sgx', 'returns': 'SGXInfo', 'if': 'TARGET_I386' }
@@ -408,7 +413,9 @@
 #
 # -> { "execute": "query-sgx-capabilities" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#  "flc": true, "section-size" : 0 } }
+#  "flc": true, "section-size" : 96468992,
+#  "section" : [{"node": 0, "size": 67108864},
+#  {"node": 1, "size": 29360128}]} }
 #
 ##
 { 'command': 'query-sgx-capabilities', 'returns': 'SGXInfo', 'if': 
'TARGET_I386' }
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 5de5dd0893..a2b318dd93 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -83,7 +83,7 @@ static uint64_t sgx_calc_section_metric(uint64_t low, 
uint64_t high)
((high & MAKE_64BIT_MASK(0, 20)) << 32);
 }
 
-static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
+static SGXEPCSectionList *sgx_calc_host_epc_sections(uint64_t *size)
 {
 SGXEPCSectionList *head = NULL, **tail = &head;
 SGXEPCSection *section;
@@ -106,6 +106,7 @@ static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
 section = g_new0(SGXEPCSection, 1);
 section->node = j++;
 section->size = sgx_calc_section_metric(ecx, edx);
+*size += section->size;
 QAPI_LIST_APPEND(tail, section);
 }
 
@@ -156,6 +157,7 @@ SGXInfo *qmp_query_sgx_capabilities(Error **errp)
 {
 SGXInfo *info = NULL;
 uint32_t eax, ebx, ecx, edx;
+uint64_t size = 0;
 
 int fd = qemu_open_old("/dev/sgx_vepc", O_RDWR);
 if (fd < 0) {
@@ -173,7 +175,8 @@ SGXInfo *qmp_query_sgx_capabilities(Error **errp)
 info->sgx1 = eax & (1U << 0) ? true : false;
 info->sgx2 = eax & (1U << 1) ? true : false;
 
-info->sections = sgx_calc_host_epc_sections();
+info->sections = sgx_calc_host_epc_sections(&size);
+info->section_size = size;
 
 close(fd);
 
@@ -220,12 +223,14 @@ SGXInfo *qmp_query_sgx(Error **errp)
 return NULL;
 }
 
+SGXEPCState *sgx_epc = &pcms->sgx_epc;
 info = g_new0(SGXInfo, 1);
 
 info->sgx = true;
 info->sgx1 = true;
 info->sgx2 = true;
 info->flc = true;
+info->section_size = sgx_epc->size;
 info->sections = sgx_get_epc_sections_list();
 
 return info;
@@ -249,6 +254,8 @@ void hmp_info_sgx(Monitor *mon, const QDict *qdict)
info->sgx2 ? "enabled" : "disabled");
 monitor_printf(mon, "FLC support: %s\n",
info->flc ? "enabled" : "disabled");
+monitor_printf(mon, "size: %" PRIu64 "\n",
+   info->section_size);
 
 section_list = info->sections;
 for (section = section_list; section; section = section->next) {

Re: [PATCH] qapi: Cleanup SGX related comments

2022-01-19 Thread Yang Zhong

On Wed, Jan 19, 2022 at 09:16:46AM +, Daniel P. Berrangé wrote:
> On Wed, Jan 19, 2022 at 07:00:14AM -0500, Yang Zhong wrote:
> > The SGX NUMA patches were merged into Qemu 7.0 release, we need
> > clarify detailed version history information and also change
> > some related comments, which make SGX related comments clearer.
> > 
> > Signed-off-by: Yang Zhong 
> > ---
> >  qapi/machine.json |  4 ++--
> >  qapi/misc-target.json | 14 +-
> >  2 files changed, 11 insertions(+), 7 deletions(-)
> 
> Reviewed-by: Daniel P. Berrangé 
> 
> > diff --git a/qapi/machine.json b/qapi/machine.json
> > index b6a37e17c4..cf47cb63a9 100644
> > --- a/qapi/machine.json
> > +++ b/qapi/machine.json
> > @@ -1207,7 +1207,7 @@
> >  #
> >  # @memdev: memory backend linked with device
> >  #
> > -# @node: the numa node
> > +# @node: the numa node (Since: 7.0)
> >  #
> >  # Since: 6.2
> >  ##
> > @@ -1288,7 +1288,7 @@
> >  #
> >  # @memdev: memory backend linked with device
> >  #
> > -# @node: the numa node
> > +# @node: the numa node (Since: 7.0)
> >  #
> >  # Since: 6.2
> >  ##
> > diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> > index 1022aa0184..558521bd39 100644
> > --- a/qapi/misc-target.json
> > +++ b/qapi/misc-target.json
> > @@ -344,9 +344,9 @@
> >  #
> >  # @node: the numa node
> >  #
> > -# @size: the size of epc section
> > +# @size: the size of EPC section
> >  #
> > -# Since: 6.2
> > +# Since: 7.0
> >  ##
> >  { 'struct': 'SGXEPCSection',
> >'data': { 'node': 'int',
> > @@ -365,7 +365,7 @@
> >  #
> >  # @flc: true if FLC is supported
> >  #
> > -# @sections: The EPC sections info for guest
> > +# @sections: The EPC sections info for guest(Since: 7.0)
> 
> Minor point - a space is needed before '('
> 
> >  #
> >  # Since: 6.2
> >  ##
> > @@ -390,7 +390,9 @@
> >  #
> >  # -> { "execute": "query-sgx" }
> >  # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
> > -#  "flc": true, "section-size" : 0 } }
> > +#  "flc": true,  "sections":
> > +#  [{"node": 0, "size": 67108864},
> > +#  {"node": 1, "size": 29360128}]} }
> >  #
> >  ##
> >  { 'command': 'query-sgx', 'returns': 'SGXInfo', 'if': 'TARGET_I386' }
> > @@ -408,7 +410,9 @@
> >  #
> >  # -> { "execute": "query-sgx-capabilities" }
> >  # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
> > -#  "flc": true, "section-size" : 0 } }
> > +#  "flc": true, "section" :
> > +#  [{"node": 0, "size": 67108864},
> > +#  {"node": 1, "size": 29360128}]} }
> 
> The 'section-size' shouldn't be removed here - we still need the
> command fixed to bring back the 'section-size' as it should not
> have been deleted.
> 
> Adding the 'section' docs is ok though.


  Thanks Daniel, You mean I need add extra doc like below? 6.2 and 7.0 command
  description for @query-sgx ?

  ##
  # @query-sgx:
  #
  # Returns information about SGX
  #
  # Returns: @SGXInfo
  #
  # Since: 6.2
  #
  # Example:
  #
  # -> { "execute": "query-sgx" }
  # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
  #  "flc": true, "section-size" : 0 } }
  #
  # Since: 7.0
  #
  # Example:
  #
  # -> { "execute": "query-sgx" }
  # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
  #  "flc": true,  "sections":
  #  [{"node": 0, "size": 67108864},
  #  {"node": 1, "size": 29360128}]} }
  #
  ##
  { 'command': 'query-sgx', 'returns': 'SGXInfo', 'if': 'TARGET_I386' }

  If this is okay for you, I can send v2 to you, thanks!

  Yang

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH] qapi: Cleanup SGX related comments

2022-01-18 Thread Yang Zhong

The SGX NUMA patches were merged into Qemu 7.0 release, we need
clarify detailed version history information and also change
some related comments, which make SGX related comments clearer.

Signed-off-by: Yang Zhong 
---
 qapi/machine.json |  4 ++--
 qapi/misc-target.json | 14 +-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index b6a37e17c4..cf47cb63a9 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1207,7 +1207,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
@@ -1288,7 +1288,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 1022aa0184..558521bd39 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -344,9 +344,9 @@
 #
 # @node: the numa node
 #
-# @size: the size of epc section
+# @size: the size of EPC section
 #
-# Since: 6.2
+# Since: 7.0
 ##
 { 'struct': 'SGXEPCSection',
   'data': { 'node': 'int',
@@ -365,7 +365,7 @@
 #
 # @flc: true if FLC is supported
 #
-# @sections: The EPC sections info for guest
+# @sections: The EPC sections info for guest(Since: 7.0)
 #
 # Since: 6.2
 ##
@@ -390,7 +390,9 @@
 #
 # -> { "execute": "query-sgx" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#  "flc": true, "section-size" : 0 } }
+#  "flc": true,  "sections":
+#  [{"node": 0, "size": 67108864},
+#  {"node": 1, "size": 29360128}]} }
 #
 ##
 { 'command': 'query-sgx', 'returns': 'SGXInfo', 'if': 'TARGET_I386' }
@@ -408,7 +410,9 @@
 #
 # -> { "execute": "query-sgx-capabilities" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#  "flc": true, "section-size" : 0 } }
+#  "flc": true, "section" :
+#  [{"node": 0, "size": 67108864},
+#  {"node": 1, "size": 29360128}]} }
 #
 ##
 { 'command': 'query-sgx-capabilities', 'returns': 'SGXInfo', 'if': 
'TARGET_I386' }

Re: [RFC PATCH 2/2] hw/i386/sgx: Attach SGX-EPC to its memory backend

2022-01-17 Thread Yang Zhong

On Mon, Jan 17, 2022 at 12:48:10PM +0100, Paolo Bonzini wrote:
> On 1/17/22 00:53, Philippe Mathieu-Daudé via wrote:
> >We have one SGX-EPC address/size/node per memory backend,
> >make it child of the backend in the QOM composition tree.
> >
> >Cc: Yang Zhong 
> >Signed-off-by: Philippe Mathieu-Daudé 
> >---
> >  hw/i386/sgx.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> >diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> >index 5de5dd08936..6362e5e9d02 100644
> >--- a/hw/i386/sgx.c
> >+++ b/hw/i386/sgx.c
> >@@ -300,6 +300,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
> >  /* set the memdev link with memory backend */
> >  object_property_parse(obj, SGX_EPC_MEMDEV_PROP, 
> > list->value->memdev,
> >&error_fatal);
> >+object_property_add_child(OBJECT(list->value->memdev), "sgx-epc",
> >+  OBJECT(obj));
> >+
> >  /* set the numa node property for sgx epc object */
> >  object_property_set_uint(obj, SGX_EPC_NUMA_NODE_PROP, 
> > list->value->node,
> >   &error_fatal);
> 
> I don't think this is a good idea; only list->value->memdev should
> add something below itself in the tree.
> 
> However, I think obj can be added under the machine itself as
> /machine/sgx-epc-device[*].
> 

  Thanks Philippe, calling object_property_add_child() in the hw/i386/sgx.c is 
more
  reasonable than in device_set_realized(), thanks!

  Yang

> Paolo

Re: [PATCH 1/2] hw/i386/x86: Attach CPUs to machine

2022-01-17 Thread Yang Zhong

On Mon, Jan 17, 2022 at 01:48:46PM +, Daniel P. Berrangé wrote:
> On Mon, Jan 17, 2022 at 12:53:30AM +0100, Philippe Mathieu-Daudé via wrote:
> > Avoid having CPUs objects dangling as unattached QOM ones,
> > directly attach them to the machine.
> 
> Lets be more explicit here
> 
> [quote]
>   Previously CPUs were exposed in the QOM tree at a path
> 
> /machine/unattached/device[nn]
> 
>   where the 'nn' of the first CPU is usually zero, but can
>   vary depending on what devices were already created.
> 
>   With this change the CPUs are now at
> 
> /machine/cpu[nn]
> 
>   where the 'nn' of the first CPU is always zero
> [/quote]
> 
> to  /machine/unattached/device[0->$SMP-COUNT]
> 
> > 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  hw/i386/x86.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index b84840a1bb9..50bf249c700 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -108,6 +108,7 @@ void x86_cpu_new(X86MachineState *x86ms, int64_t 
> > apic_id, Error **errp)
> >  {
> >  Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
> >  
> > +object_property_add_child(OBJECT(x86ms), "cpu[*]", OBJECT(cpu));
> >  if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
> >  goto out;
> >  }
> > -- 
> > 2.34.1
> > 
> > 


  Thanks Philippe, if we change /machine/unattached/device[nn] to 
/machine/cpu[nn],
  the related changes should also be done in the Libvirt side, which still check
  /machine/unattached/device[0] to get unvailable-features. thanks!

  Yang

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PULL 11/13] numa: Support SGX numa in the monitor and Libvirt interfaces

2022-01-17 Thread Yang Zhong

On Thu, Jan 13, 2022 at 04:15:10PM +, Daniel P. Berrangé wrote:
> On Wed, Dec 15, 2021 at 09:25:13PM +0100, Paolo Bonzini wrote:
> > From: Yang Zhong 
> > 
> > Add the SGXEPCSection list into SGXInfo to show the multiple
> > SGX EPC sections detailed info, not the total size like before.
> > This patch can enable numa support for 'info sgx' command and
> > QMP interfaces. The new interfaces show each EPC section info
> > in one numa node. Libvirt can use QMP interface to get the
> > detailed host SGX EPC capabilities to decide how to allocate
> > host EPC sections to guest.
> 
> 
> 
> > diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> > index 5aa2b95b7d..1022aa0184 100644
> > --- a/qapi/misc-target.json
> > +++ b/qapi/misc-target.json
> > @@ -337,6 +337,21 @@
> >'if': 'TARGET_ARM' }
> >  
> >  
> > +##
> > +# @SGXEPCSection:
> > +#
> > +# Information about intel SGX EPC section info
> > +#
> > +# @node: the numa node
> > +#
> > +# @size: the size of epc section
> > +#
> > +# Since: 6.2
> 
> This is wrong because it was merged for 7.0 not 6.2
> 
> > +##
> > +{ 'struct': 'SGXEPCSection',
> > +  'data': { 'node': 'int',
> > +'size': 'uint64'}}
> > +
> >  ##
> >  # @SGXInfo:
> >  #
> > @@ -350,7 +365,7 @@
> >  #
> >  # @flc: true if FLC is supported
> >  #
> > -# @section-size: The EPC section size for guest
> > +# @sections: The EPC sections info for guest
> 
> This is a non-backwards compatible schema change.
> 
> "@section-size" must not be removed without going
> through a deprecation period, so this needs to be
> re-instated.
> 
> The "@sections" addition needs a "Since 7.0" annotation too.
> 
> 
> Yong, can you submit a followup patch to correct these mistakes
> 

  Thanks, I will submit one patch to fix this version issue. This series
  support SGX NUMA, the background is SGX EPC section number is not fixed(<=8)
  I added this "@sections" to include numa node and EPC section size, which can
  be shown how to allocate EPC sections to different NUMA nodes in the VM.
  The older "@section-size" is only suitable for one EPC section on one NUMA 
node
  in one VM, so I moved this size into "@sections" here for NUMA support.

  Yang
 
  
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: unable to execute QEMU command 'qom-get': Property 'sgx-epc.unavailable-features' not found

2022-01-16 Thread Yang Zhong

On Mon, Jan 17, 2022 at 04:53:45AM +0200, Jarkko Sakkinen wrote:
> On Tue, Nov 30, 2021 at 08:15:36PM +0800, Yang Zhong wrote:
> > On Thu, Nov 25, 2021 at 08:47:22PM +0800, Yang Zhong wrote:
> > > Hello Paolo,
> > > 
> > > Our customer used the Libvirt XML to start a SGX VM, but failed.
> > > 
> > > libvirt.libvirtError: internal error: unable to execute QEMU command 
> > > 'qom-get': Property 'sgx-epc.unavailable-features' not found
> > > 
> > > The XML file,
> > > 
> > > 
> > >  > > value="host,+sgx,+sgx-debug,+sgx-exinfo,+sgx-kss,+sgx-mode64,+sgx-provisionkey,+sgx-tokenkey,+sgx1,+sgx2,+sgxlc"/>
> > > 
> > > 
> > > 
> > > 
> > >   
> > > 
> > > The new compound property command should be located in /machine path,
> > > which are different with old command '-sgx-epc id=epc1,memdev=mem1'.
> > > 
> > > I also tried this from Qemu monitor tool, 
> > > (qemu) qom-list /machine
> > > type (string)
> > > kernel (string)
> > > ..
> > > sgx-epc (SgxEPC)
> > > ..
> > > sgx-epc[0] (child)
> > > ..
> > > 
> > > We can find sgx-epc from /machine list.
> > > 
> > 
> >   This issue is clear now, which is caused by Libvirt to get the CPU's 
> > unavailable-features by below command:
> >   
> > {"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"}
> > 
> >   but in SGX vm, since the sgx is initialized before VCPU because sgx need 
> > set the virtual EPC info in the cpuid.  
> > 
> >   So the /machine/unattached/device[0] is occupied by sgx, which fail to 
> > get the unvailable-features from
> >   /machine/unattached/device[0].
> > 
> > 
> >   We need fix this issue, but this can be done in Qemu or Libvirt side.
> > 
> >   1) Libvirt side
> >  If the libvirt support SGX EPCs, libvirt can use 
> > /machine/unattached/device[n] to check "unavailable-features".
> >  n is the next number of sgx's unattached_count.
> > 
> >   2) Qemu side
> > 
> >  One temp patch to create one /sgx in the /machine in the 
> > device_set_realized() 
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 84f3019440..4154eef0d8 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -497,7 +497,7 @@ static void device_set_realized(Object *obj, bool 
> > value, Error **errp)
> >  NamedClockList *ncl;
> >  Error *local_err = NULL;
> >  bool unattached_parent = false;
> > -static int unattached_count;
> > +static int unattached_count, sgx_count;
> > 
> >  if (dev->hotplugged && !dc->hotpluggable) {
> >  error_setg(errp, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
> > @@ -509,7 +509,15 @@ static void device_set_realized(Object *obj, bool 
> > value, Error **errp)
> >  goto fail;
> >  }
> > 
> > -if (!obj->parent) {
> > +if (!obj->parent && !strcmp(object_get_typename(obj), "sgx-epc")) {
> > +gchar *name = g_strdup_printf("device[%d]", sgx_count++);
> > +
> > +object_property_add_child(container_get(qdev_get_machine(),
> > +"/sgx"),
> > +  name, obj);
> > +unattached_parent = true;
> > +g_free(name);
> > +} else if (!obj->parent) {
> >  gchar *name = g_strdup_printf("device[%d]", 
> > unattached_count++);
> > 
> >  object_property_add_child(container_get(qdev_get_machine()
> >
> > This patch can make sure vcpu is still /machine/unattached/device[0].
> > 
> > 
> > Which solution is best?  thanks!
> 
> Has either of the fixes reached yet reached upstream or not?


  Jarkko, I sent out one patch to fix this issue last week,
  https://lists.nongnu.org/archive/html/qemu-devel/2022-01/msg02502.html

  Daniel regarded this fix is special code for SGX in the generic object code.
  So, this fix can be done in Libvirt side. Did you face this issue? or you can
  use this patch as TEMP fix. thanks!

  Yang  

> 
> > Yang
> 
> BR, Jarkko

Re: [PATCH] sgx: Move sgx object from /machine/unattached to /machine

2022-01-12 Thread Yang Zhong

Hi Daniel,

On Wed, Jan 12, 2022 at 10:11:35AM +, Daniel P. Berrangé wrote:
> On Wed, Jan 12, 2022 at 11:55:17AM -0500, Yang Zhong wrote:
> > When Libvirt start, it get the vcpu's unavailable-features from
> > /machine/unattached/device[0] path by qom-get command, but in SGX
> > guest, since the sgx-epc virtual device is initialized before VCPU
> > creation(virtual sgx need set the virtual EPC info in the cpuid). This
> > /machine/unattached/device[0] is occupied by sgx-epc device, which
> > fail to get the unvailable-features from /machine/unattached/device[0].
> 
> If libvirt decides to enable SGX in a VM, then surely it knows
> that it should just query /machine/unattached/device[1] to get
> the CPU features instead. Why do we need to do anything in QEMU ?
> 

  I listed two solutions in the Qemu or Libvirt before:
  https://lists.nongnu.org/archive/html/qemu-devel/2021-11/msg05670.html

  This time, I posted this patch and hope to have a talk for this issue.

  If Libvirt side should handle this, I will drop this patch and inform
  them to do this. Thanks!


> > 
> > This patch make one new /machine/sgx object to avoid this issue.
> > (qemu) qom-list /machine/unattached/
> > device[0] (child)
> > 
> > (qemu) qom-list /machine/sgx
> > device[0] (child)
> > 
> > Signed-off-by: Yang Zhong 
> > ---
> >  hw/core/qdev.c | 12 ++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 84f3019440..4154eef0d8 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -497,7 +497,7 @@ static void device_set_realized(Object *obj, bool 
> > value, Error **errp)
> >  NamedClockList *ncl;
> >  Error *local_err = NULL;
> >  bool unattached_parent = false;
> > -static int unattached_count;
> > +static int unattached_count, sgx_count;
> >  
> >  if (dev->hotplugged && !dc->hotpluggable) {
> >  error_setg(errp, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
> > @@ -509,7 +509,15 @@ static void device_set_realized(Object *obj, bool 
> > value, Error **errp)
> >  goto fail;
> >  }
> >  
> > -if (!obj->parent) {
> > +if (!obj->parent && !strcmp(object_get_typename(obj), "sgx-epc")) {
> > +gchar *name = g_strdup_printf("device[%d]", sgx_count++);
> > +
> > +object_property_add_child(container_get(qdev_get_machine(),
> > +"/sgx"),
> > +  name, obj);
> > +unattached_parent = true;
> > +g_free(name);
> 
> The qdev.c file is part of our generic object code. It should not
> contain any code that is tied to very specific object types like
> this.

  Okay, thanks!

  Yang 


> 
> > +} else if (!obj->parent) {
> >  gchar *name = g_strdup_printf("device[%d]", 
> > unattached_count++);
> >  
> >  object_property_add_child(container_get(qdev_get_machine(),
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH] sgx: Move sgx object from /machine/unattached to /machine

2022-01-12 Thread Yang Zhong

When Libvirt start, it get the vcpu's unavailable-features from
/machine/unattached/device[0] path by qom-get command, but in SGX
guest, since the sgx-epc virtual device is initialized before VCPU
creation(virtual sgx need set the virtual EPC info in the cpuid). This
/machine/unattached/device[0] is occupied by sgx-epc device, which
fail to get the unvailable-features from /machine/unattached/device[0].

This patch make one new /machine/sgx object to avoid this issue.
(qemu) qom-list /machine/unattached/
device[0] (child)

(qemu) qom-list /machine/sgx
device[0] (child)

Signed-off-by: Yang Zhong 
---
 hw/core/qdev.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 84f3019440..4154eef0d8 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -497,7 +497,7 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
 NamedClockList *ncl;
 Error *local_err = NULL;
 bool unattached_parent = false;
-static int unattached_count;
+static int unattached_count, sgx_count;
 
 if (dev->hotplugged && !dc->hotpluggable) {
 error_setg(errp, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
@@ -509,7 +509,15 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
 goto fail;
 }
 
-if (!obj->parent) {
+if (!obj->parent && !strcmp(object_get_typename(obj), "sgx-epc")) {
+gchar *name = g_strdup_printf("device[%d]", sgx_count++);
+
+object_property_add_child(container_get(qdev_get_machine(),
+"/sgx"),
+  name, obj);
+unattached_parent = true;
+g_free(name);
+} else if (!obj->parent) {
 gchar *name = g_strdup_printf("device[%d]", unattached_count++);
 
 object_property_add_child(container_get(qdev_get_machine(),

Re: [RFC PATCH 3/7] x86: Grant AMX permission for guest

2022-01-10 Thread Yang Zhong

On Mon, Jan 10, 2022 at 04:36:13PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang 
> > Sent: Friday, January 7, 2022 5:32 PM
> >
> > Kernel mechanism for dynamically enabled XSAVE features
> 
> there is no definition of "dynamically-enabled XSAVE features).
> 

  Thanks!


> > asks userspace VMM requesting guest permission if it wants
> > to expose the features. Only with the permission, kernel
> > can try to enable the features when detecting the intention
> > from guest in runtime.
> >
> > Qemu should request the permission for guest only once
> > before the first vCPU is created. KVM checks the guest
> > permission when Qemu advertises the features, and the
> > advertising operation fails w/o permission.
> 
> what about below?
> 
> "Kernel allocates 4K xstate buffer by default. For XSAVE features
> which require large state component (e.g. AMX), Linux kernel
> dynamically expands the xstate buffer only after the process has
> acquired the necessary permissions. Those are called dynamically-
> enabled XSAVE features (or dynamic xfeatures).
> 
> There are separate permissions for native tasks and guests.
> 
> Qemu should request the guest permissions for dynamic xfeatures
> which will be exposed to the guest. This only needs to be done
> once before the first vcpu is created."


  This is clearer. Will update this in new version, thanks!


> 
> >
> > Signed-off-by: Yang Zhong 
> > Signed-off-by: Jing Liu 
> > ---
> >  target/i386/cpu.h |  7 +++
> >  hw/i386/x86.c | 28 
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 768a8218be..79023fe723 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -549,6 +549,13 @@ typedef enum X86Seg {
> >  #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
> >  #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
> >  #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
> > +#define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
> > +#define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
> > +#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
> > + | XSTATE_XTILE_DATA_MASK)
> > +
> > +#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
> > +#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
> >
> >  /* CPUID feature words */
> >  typedef enum FeatureWord {
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index b84840a1bb..0a204c375e 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -41,6 +41,8 @@
> >  #include "sysemu/cpu-timers.h"
> >  #include "trace.h"
> >
> > +#include 
> > +
> >  #include "hw/i386/x86.h"
> >  #include "target/i386/cpu.h"
> >  #include "hw/i386/topology.h"
> > @@ -117,6 +119,30 @@ out:
> >  object_unref(cpu);
> >  }
> >
> > +static void x86_xsave_req_perm(void)
> > +{
> > +unsigned long bitmask;
> > +
> > +long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
> > +  XSTATE_XTILE_DATA_BIT);
> 
> Should we do it based on the cpuid for the first vcpu?


  This permission is requested before vcpu init, so put it in
  x86_cpus_init(). If the host kernel does not include AMX changes, or
  the latest kernel(include AMX) install on previous generation x86
  platform, this syscall() will directly return. I ever put this
  permission request in the vcpu create function, but it's hard
  to find a good location to handle this. As for cpuid, you mean
  I need check host cpuid info? to check if this host cpu can support
  AMX? thanks!

  Yang   
   
> 
> > +if (rc) {
> > +/*
> > + * The older kernel version(<5.15) can't support
> > + * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
> > + */
> > +return;
> > +}
> > +
> > +rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
> > +if (rc) {
> > +error_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
> > +} else if (!(bitmask & XFEATURE_XTILE_MASK)) {
> > +error_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
> > + "and bitmask=0x%lx", bitmask);
> > +exit(EXIT_FAILURE);
> > +}
> > +}
> > +
> >  void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
> >  {
> >  int i;
> > @@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms, int
> > default_cpu_version)
> >  MachineState *ms = MACHINE(x86ms);
> >  MachineClass *mc = MACHINE_GET_CLASS(x86ms);
> >
> > +/* Request AMX pemission for guest */
> > +x86_xsave_req_perm();
> >  x86_cpu_set_default_version(default_cpu_version);
> >
> >  /*

Re: [RFC PATCH 4/7] x86: Add XFD faulting bit for state components

2022-01-10 Thread Yang Zhong

On Mon, Jan 10, 2022 at 04:38:18PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang 
> > Sent: Friday, January 7, 2022 5:32 PM
> >
> > From: Jing Liu 
> >
> > Intel introduces XFD faulting mechanism for extended
> > XSAVE features to dynamically enable the features in
> > runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
> > as 1, it indicates support for XFD faulting of this
> > state component.
> >
> > Signed-off-by: Jing Liu 
> > Signed-off-by: Yang Zhong 
> > ---
> >  target/i386/cpu.h | 2 +-
> >  target/i386/cpu.c | 2 +-
> >  target/i386/kvm/kvm-cpu.c | 1 +
> >  3 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 79023fe723..22f7ff40a6 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1375,7 +1375,7 @@
> > QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
> >  typedef struct ExtSaveArea {
> >  uint32_t feature, bits;
> >  uint32_t offset, size;
> > -uint32_t need_align;
> > +uint32_t need_align, support_xfd;
> 
> why each flag be a 32-bit field?
>
  
  Using the uint32_t to define those flags for below ecx value 
  *ecx = (esa->need_align << 1) | (esa->support_xfd << 2);

 
> also it's more natural to have them in separate lines, though I'm not
> sure why existing fields are put this way (possibly due to short names?).
> 

  Yes, support_xfd flag will be in another line to define, thanks!

  Yang


> >  } ExtSaveArea;
> >
> >  #define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index dd2c919c33..1adc3f0f99 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -5495,7 +5495,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
> > index, uint32_t count,
> >  const ExtSaveArea *esa = &x86_ext_save_areas[count];
> >  *eax = esa->size;
> >  *ebx = esa->offset;
> > -*ecx = esa->need_align << 1;
> > +*ecx = (esa->need_align << 1) | (esa->support_xfd << 2);
> >  }
> >  }
> >  break;
> > diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> > index 6c4c1c6f9d..3b3c203f11 100644
> > --- a/target/i386/kvm/kvm-cpu.c
> > +++ b/target/i386/kvm/kvm-cpu.c
> > @@ -108,6 +108,7 @@ static void kvm_cpu_xsave_init(void)
> >
> >  uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
> >  esa->need_align = ecx & (1u << 1) ? 1 : 0;
> > +esa->support_xfd = ecx & (1u << 2) ? 1 : 0;
> >  }
> >  }
> >  }

Re: [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state

2022-01-10 Thread Yang Zhong

On Mon, Jan 10, 2022 at 04:20:41PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang 
> > Sent: Friday, January 7, 2022 5:31 PM
> >
> > From: Jing Liu 
> >
> > The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
> > are all zero, while spec actually introduces that bit 01
> > should indicate if the extended state component locates
> > on the next 64-byte boundary following the preceding state
> > component when the compacted format of an XSAVE area is
> > used.
> 
> Above would read clearer if you revise to:
> 
> "The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
> indicate whether the extended state component locates
> on the next 64-byte boundary following the preceding state
> component when the compacted format of an XSAVE area is
> used.
> 
> But ECX[1] is always cleared in current implementation."

  Thanks Kevin, I will update this in next version.

  Yang

Re: [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components

2022-01-10 Thread Yang Zhong

On Mon, Jan 10, 2022 at 04:23:47PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang 
> > Sent: Friday, January 7, 2022 5:31 PM
> >
> > From: Jing Liu 
> >
> > AMX XTILECFG and XTILEDATA are managed by XSAVE feature
> > set. State component 17 is used for 64-byte TILECFG register
> > (XTILECFG state) and component 18 is used for 8192 bytes
> > of tile data (XTILEDATA state).
> 
> to be consistent, "tile data" -> "TILEDATA"
> 
> >
> > Add AMX feature bits to x86_ext_save_areas array to set
> > up AMX components. Add structs that define the layout of
> > AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
> > structs sizes.
> >
> > Signed-off-by: Jing Liu 
> > Signed-off-by: Yang Zhong 
> > ---
> >  target/i386/cpu.h | 16 +++-
> >  target/i386/cpu.c |  8 
> >  2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 7f9700544f..768a8218be 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -537,6 +537,8 @@ typedef enum X86Seg {
> >  #define XSTATE_ZMM_Hi256_BIT6
> >  #define XSTATE_Hi16_ZMM_BIT 7
> >  #define XSTATE_PKRU_BIT 9
> > +#define XSTATE_XTILE_CFG_BIT17
> > +#define XSTATE_XTILE_DATA_BIT   18
> >
> >  #define XSTATE_FP_MASK  (1ULL << XSTATE_FP_BIT)
> >  #define XSTATE_SSE_MASK (1ULL << XSTATE_SSE_BIT)
> > @@ -1343,6 +1345,16 @@ typedef struct XSavePKRU {
> >  uint32_t padding;
> >  } XSavePKRU;
> >
> > +/* Ext. save area 17: AMX XTILECFG state */
> > +typedef struct XSaveXTILE_CFG {
> 
> remove "_"?
> 
> > +uint8_t xtilecfg[64];
> > +} XSaveXTILE_CFG;
> > +
> > +/* Ext. save area 18: AMX XTILEDATA state */
> > +typedef struct XSaveXTILE_DATA {
> 
> ditto
>

  Thanks Kevin, I will update this in new version.

  Yang

[RFC PATCH 7/7] x86: Support XFD and AMX xsave data migration

2022-01-07 Thread Yang Zhong

From: Zeng Guang 

XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h |  9 +
 target/i386/kvm/kvm.c | 18 ++
 target/i386/machine.c | 42 ++
 3 files changed, 69 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6153c4ab1a..1627988790 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -505,6 +505,9 @@ typedef enum X86Seg {
 
 #define MSR_VM_HSAVE_PA 0xc0010117
 
+#define MSR_IA32_XFD0x01c4
+#define MSR_IA32_XFD_ERR0x01c5
+
 #define MSR_IA32_BNDCFGS0x0d90
 #define MSR_IA32_XSS0x0da0
 #define MSR_IA32_UMWAIT_CONTROL 0xe1
@@ -866,6 +869,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_1_EAX_AVX_VNNI  (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16   (1U << 5)
+/* XFD Extend Feature Disabled */
+#define CPUID_D_1_EAX_XFD   (1U << 4)
 
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
@@ -1608,6 +1613,10 @@ typedef struct CPUX86State {
 uint64_t msr_rtit_cr3_match;
 uint64_t msr_rtit_addrs[MAX_RTIT_ADDRS];
 
+/* Per-VCPU XFD MSRs */
+uint64_t msr_xfd;
+uint64_t msr_xfd_err;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 97520e9dff..02d5cf1063 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3192,6 +3192,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_ia32_sgxlepubkeyhash[3]);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD,
+  env->msr_xfd);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR,
+  env->msr_xfd_err);
+}
+
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
 }
@@ -3548,6 +3555,11 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, MSR_IA32_SGXLEPUBKEYHASH3, 0);
 }
 
+if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+kvm_msr_entry_add(cpu, MSR_IA32_XFD, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -3844,6 +3856,12 @@ static int kvm_get_msrs(X86CPU *cpu)
 env->msr_ia32_sgxlepubkeyhash[index - MSR_IA32_SGXLEPUBKEYHASH0] =
msrs[i].data;
 break;
+case MSR_IA32_XFD:
+env->msr_xfd = msrs[i].data;
+break;
+case MSR_IA32_XFD_ERR:
+env->msr_xfd_err = msrs[i].data;
+break;
 }
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 83c2b91529..fdeb5bab50 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1455,6 +1455,46 @@ static const VMStateDescription vmstate_msr_intel_sgx = {
 }
 };
 
+static bool xfd_msrs_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD);
+}
+
+static const VMStateDescription vmstate_msr_xfd = {
+.name = "cpu/msr_xfd",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xfd_msrs_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.msr_xfd, X86CPU),
+VMSTATE_UINT64(env.msr_xfd_err, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool amx_xtile_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = &cpu->env;
+
+return !!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE);
+}
+
+static const VMStateDescription vmstate_amx_xtile = {
+.name = "cpu/intel_amx_xtile",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = amx_xtile_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8_ARRAY(env.xtilecfg, X86CPU, 64),
+VMSTATE_UINT8_ARRAY(env.xtiledata, X86CPU, 8192),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -1593,6 +1633,8 @@ const VMStateDescription vmstate_x86_cpu = {
 #endif
 &vmstate_msr_tsx_ctrl,
 &vmstate_msr_intel_sgx,
+&vmstate_msr_xfd,
+&vmstate_amx_xtile,
 NULL
 }
 };

[RFC PATCH 6/7] x86: Use new XSAVE ioctls handling

2022-01-07 Thread Yang Zhong

From: Jing Liu 

Extended feature has large state while current
kvm_xsave only allows 4KB. Use new XSAVE ioctls
if the xstate size is large than kvm_xsave.

Signed-off-by: Jing Liu 
Signed-off-by: Zeng Guang 
Signed-off-by: Wei Wang 
Signed-off-by: Yang Zhong 
---
 linux-headers/asm-x86/kvm.h | 14 ++
 linux-headers/linux/kvm.h   |  2 ++
 target/i386/cpu.h   |  5 +
 target/i386/kvm/kvm.c   | 16 ++--
 target/i386/xsave_helper.c  | 35 +++
 5 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 5a776a08f7..32f2a921e8 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -376,6 +376,20 @@ struct kvm_debugregs {
 /* for KVM_CAP_XSAVE */
 struct kvm_xsave {
__u32 region[1024];
+   /*
+* KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many bytes
+* as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+* respectively, when invoked on the vm file descriptor.
+*
+* The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+* will always be at least 4096. Currently, it is only greater
+* than 4096 if a dynamic feature has been enabled with
+* ``arch_prctl()``, but this may change in the future.
+*
+* The offsets of the state save areas in struct kvm_xsave follow
+* the contents of CPUID leaf 0xD on the host.
+*/
+   __u32 extra[0];
 };
 
 #define KVM_MAX_XCRS   16
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 02c5e7b7bb..97d5b6d81d 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_XSAVE2  207
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_XSAVE */
 #define KVM_GET_XSAVE_IOR(KVMIO,  0xa4, struct kvm_xsave)
 #define KVM_SET_XSAVE_IOW(KVMIO,  0xa5, struct kvm_xsave)
+#define KVM_GET_XSAVE2   _IOR(KVMIO,  0xcf, struct kvm_xsave)
 /* Available with KVM_CAP_XCRS */
 #define KVM_GET_XCRS _IOR(KVMIO,  0xa6, struct kvm_xcrs)
 #define KVM_SET_XCRS _IOW(KVMIO,  0xa7, struct kvm_xcrs)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 245e8b5a1a..6153c4ab1a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1519,6 +1519,11 @@ typedef struct CPUX86State {
 YMMReg zmmh_regs[CPU_NB_REGS];
 ZMMReg hi16_zmm_regs[CPU_NB_REGS];
 
+#ifdef TARGET_X86_64
+uint8_t xtilecfg[64];
+uint8_t xtiledata[8192];
+#endif
+
 /* sysenter registers */
 uint32_t sysenter_cs;
 target_ulong sysenter_esp;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3fb3ddbe2b..97520e9dff 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1983,7 +1983,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 
 if (has_xsave) {
-env->xsave_buf_len = sizeof(struct kvm_xsave);
+uint32_t size = kvm_vm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+if (!size) {
+size = sizeof(struct kvm_xsave);
+}
+
+env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
 env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
 memset(env->xsave_buf, 0, env->xsave_buf_len);
 
@@ -2580,6 +2585,7 @@ static int kvm_put_xsave(X86CPU *cpu)
 if (!has_xsave) {
 return kvm_put_fpu(cpu);
 }
+
 x86_cpu_xsave_all_areas(cpu, xsave, env->xsave_buf_len);
 
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XSAVE, xsave);
@@ -3247,10 +3253,16 @@ static int kvm_get_xsave(X86CPU *cpu)
 return kvm_get_fpu(cpu);
 }
 
-ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+} else {
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);
+}
+
 if (ret < 0) {
 return ret;
 }
+
 x86_cpu_xrstor_all_areas(cpu, xsave, env->xsave_buf_len);
 
 return 0;
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index ac61a96344..090424e820 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -5,6 +5,7 @@
 #include "qemu/osdep.h"
 
 #include "cpu.h"
+#include 
 
 void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
 {
@@ -126,6 +127,23 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, 
uint32_t buflen)
 
 memcpy(pkru, &env->pkru, sizeof(env->pkru));
 }
+
+e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
+if (e->size && e->offset) {
+XSaveXTILE_CFG *tilecfg = buf + e->offset;
+
+memcpy(ti

[RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components

2022-01-07 Thread Yang Zhong

From: Jing Liu 

AMX XTILECFG and XTILEDATA are managed by XSAVE feature
set. State component 17 is used for 64-byte TILECFG register
(XTILECFG state) and component 18 is used for 8192 bytes
of tile data (XTILEDATA state).

Add AMX feature bits to x86_ext_save_areas array to set
up AMX components. Add structs that define the layout of
AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
structs sizes.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 16 +++-
 target/i386/cpu.c |  8 
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7f9700544f..768a8218be 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -537,6 +537,8 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT6
 #define XSTATE_Hi16_ZMM_BIT 7
 #define XSTATE_PKRU_BIT 9
+#define XSTATE_XTILE_CFG_BIT17
+#define XSTATE_XTILE_DATA_BIT   18
 
 #define XSTATE_FP_MASK  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK (1ULL << XSTATE_SSE_BIT)
@@ -1343,6 +1345,16 @@ typedef struct XSavePKRU {
 uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 17: AMX XTILECFG state */
+typedef struct XSaveXTILE_CFG {
+uint8_t xtilecfg[64];
+} XSaveXTILE_CFG;
+
+/* Ext. save area 18: AMX XTILEDATA state */
+typedef struct XSaveXTILE_DATA {
+uint8_t xtiledata[8][1024];
+} XSaveXTILE_DATA;
+
 QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
@@ -1350,6 +1362,8 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveOpmask) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveZMM_Hi256) != 0x200);
 QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
 QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_CFG) != 0x40);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
 
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
@@ -1357,7 +1371,7 @@ typedef struct ExtSaveArea {
 uint32_t need_align;
 } ExtSaveArea;
 
-#define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
+#define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
 
 extern ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT];
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 47bc4d5c1a..dd2c919c33 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1401,6 +1401,14 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = 
{
 [XSTATE_PKRU_BIT] =
   { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
 .size = sizeof(XSavePKRU) },
+[XSTATE_XTILE_CFG_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILE_CFG),
+},
+[XSTATE_XTILE_DATA_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+.size = sizeof(XSaveXTILE_DATA),
+},
 };
 
 static uint32_t xsave_area_size(uint64_t mask)

[RFC PATCH 4/7] x86: Add XFD faulting bit for state components

2022-01-07 Thread Yang Zhong

From: Jing Liu 

Intel introduces XFD faulting mechanism for extended
XSAVE features to dynamically enable the features in
runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
as 1, it indicates support for XFD faulting of this
state component.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 2 +-
 target/i386/cpu.c | 2 +-
 target/i386/kvm/kvm-cpu.c | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 79023fe723..22f7ff40a6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1375,7 +1375,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
 uint32_t offset, size;
-uint32_t need_align;
+uint32_t need_align, support_xfd;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index dd2c919c33..1adc3f0f99 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5495,7 +5495,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
-*ecx = esa->need_align << 1;
+*ecx = (esa->need_align << 1) | (esa->support_xfd << 2);
 }
 }
 break;
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 6c4c1c6f9d..3b3c203f11 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -108,6 +108,7 @@ static void kvm_cpu_xsave_init(void)
 
 uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
 esa->need_align = ecx & (1u << 1) ? 1 : 0;
+esa->support_xfd = ecx & (1u << 2) ? 1 : 0;
 }
 }
 }

[RFC PATCH 3/7] x86: Grant AMX permission for guest

2022-01-07 Thread Yang Zhong

Kernel mechanism for dynamically enabled XSAVE features
asks userspace VMM requesting guest permission if it wants
to expose the features. Only with the permission, kernel
can try to enable the features when detecting the intention
from guest in runtime.

Qemu should request the permission for guest only once
before the first vCPU is created. KVM checks the guest
permission when Qemu advertises the features, and the
advertising operation fails w/o permission.

Signed-off-by: Yang Zhong 
Signed-off-by: Jing Liu 
---
 target/i386/cpu.h |  7 +++
 hw/i386/x86.c | 28 
 2 files changed, 35 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 768a8218be..79023fe723 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -549,6 +549,13 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
+#define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
+#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
+ | XSTATE_XTILE_DATA_MASK)
+
+#define ARCH_GET_XCOMP_GUEST_PERM   0x1024
+#define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
 
 /* CPUID feature words */
 typedef enum FeatureWord {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index b84840a1bb..0a204c375e 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -41,6 +41,8 @@
 #include "sysemu/cpu-timers.h"
 #include "trace.h"
 
+#include 
+
 #include "hw/i386/x86.h"
 #include "target/i386/cpu.h"
 #include "hw/i386/topology.h"
@@ -117,6 +119,30 @@ out:
 object_unref(cpu);
 }
 
+static void x86_xsave_req_perm(void)
+{
+unsigned long bitmask;
+
+long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
+  XSTATE_XTILE_DATA_BIT);
+if (rc) {
+/*
+ * The older kernel version(<5.15) can't support
+ * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
+ */
+return;
+}
+
+rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
+if (rc) {
+error_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
+} else if (!(bitmask & XFEATURE_XTILE_MASK)) {
+error_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
+ "and bitmask=0x%lx", bitmask);
+exit(EXIT_FAILURE);
+}
+}
+
 void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
 {
 int i;
@@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 MachineState *ms = MACHINE(x86ms);
 MachineClass *mc = MACHINE_GET_CLASS(x86ms);
 
+/* Request AMX pemission for guest */
+x86_xsave_req_perm();
 x86_cpu_set_default_version(default_cpu_version);
 
 /*

[RFC PATCH 0/7] AMX support in Qemu

2022-01-07 Thread Yang Zhong

Intel introduces Advanced Matrix Extensions (AMX) [1] feature that
consists of configurable two-dimensional "TILE" registers and new
accelerator instructions that operate on them. TMUL (Tile matrix
MULtiply) is the first accelerator instruction set to use the new
registers.

This series is based on the AMX KVM series [2] and exposes AMX feature
to guest (The detailed design discussions can be found in [3]).

According to the KVM design, the userspace VMM (e.g. Qemu) is expected
to request guest permission for the dynamically-enabled XSAVE features
only once when the first vCPU is created, and KVM checks guest permission
in KVM_SET_CPUID2.

Intel AMX is XSAVE supported and XSAVE enabled. Those extended features
has large state while current kvm_xsave only allows 4KB. The AMX KVM has
extended struct kvm_xsave to meet this requirenment and added one extra
KVM_GET_XSAVE2 ioctl to handle extended features. From our test, the AMX
live migration work well.

Notice: This version still includes some definitions in the linux-headers,
once AMX KVM is merged and Qemu sync those linux-headers, I will remove
those definitions. So please ignore those changes.

[1] Intel Architecture Instruction Set Extension Programming Reference
https://software.intel.com/content/dam/develop/external/us/en/documents/\
architecture-instruction-set-extensions-programming-reference.pdf
[2] https://www.spinics.net/lists/kvm/msg263577.html
[3] https://www.spinics.net/lists/kvm/msg259015.html

Thanks,
Yang


Jing Liu (5):
  x86: Fix the 64-byte boundary enumeration for extended state
  x86: Add AMX XTILECFG and XTILEDATA components
  x86: Add XFD faulting bit for state components
  x86: Add AMX CPUIDs enumeration
  x86: Use new XSAVE ioctls handling

Yang Zhong (1):
  x86: Grant AMX permission for guest

Zeng Guang (1):
  x86: Support XFD and AMX xsave data migration

 linux-headers/asm-x86/kvm.h | 14 
 linux-headers/linux/kvm.h   |  2 ++
 target/i386/cpu.h   | 40 ++-
 hw/i386/x86.c   | 28 
 target/i386/cpu.c   | 64 +++--
 target/i386/kvm/kvm-cpu.c   |  4 +++
 target/i386/kvm/kvm.c   | 37 +++--
 target/i386/machine.c   | 42 
 target/i386/xsave_helper.c  | 35 
 9 files changed, 259 insertions(+), 7 deletions(-)

[RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state

2022-01-07 Thread Yang Zhong

From: Jing Liu 

The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
are all zero, while spec actually introduces that bit 01
should indicate if the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

Fix the subleaves value according to the host supported
cpuid. The upcoming AMX feature would be the first one
using it.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h | 1 +
 target/i386/cpu.c | 1 +
 target/i386/kvm/kvm-cpu.c | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 04f2b790c9..7f9700544f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1354,6 +1354,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
 uint32_t offset, size;
+uint32_t need_align;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index aa9e636800..47bc4d5c1a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5487,6 +5487,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = &x86_ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
+*ecx = esa->need_align << 1;
 }
 }
 break;
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index d95028018e..6c4c1c6f9d 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -105,6 +105,9 @@ static void kvm_cpu_xsave_init(void)
 assert(esa->size == sz);
 esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
 }
+
+uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
+esa->need_align = ecx & (1u << 1) ? 1 : 0;
 }
 }
 }

[RFC PATCH 5/7] x86: Add AMX CPUIDs enumeration

2022-01-07 Thread Yang Zhong

From: Jing Liu 

Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu 
Signed-off-by: Yang Zhong 
---
 target/i386/cpu.h |  2 ++
 target/i386/cpu.c | 55 ---
 target/i386/kvm/kvm.c |  3 ++-
 3 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 22f7ff40a6..245e8b5a1a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -849,6 +849,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_TSX_LDTRK (1U << 16)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16   (1U << 23)
+/* AMX tile (two-dimensional register)*/
+#define CPUID_7_0_EDX_AMX_TILE  (1U << 24)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26)
 /* Single Thread Indirect Branch Predictors */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1adc3f0f99..025e35471f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -574,6 +574,18 @@ static CPUCacheInfo legacy_l3_cache = {
 #define INTEL_PT_CYCLE_BITMAP0x1fff /* Support 0,2^(0~11) */
 #define INTEL_PT_PSB_BITMAP  (0x003f << 16) /* Support 
2K,4K,8K,16K,32K,64K */
 
+/* CPUID Leaf 0x1D constants: */
+#define INTEL_AMX_TILE_MAX_SUBLEAF 0x1
+#define INTEL_AMX_TOTAL_TILE_BYTES 0x2000
+#define INTEL_AMX_BYTES_PER_TILE   0x400
+#define INTEL_AMX_BYTES_PER_ROW0x40
+#define INTEL_AMX_TILE_MAX_NAMES   0x8
+#define INTEL_AMX_TILE_MAX_ROWS0x10
+
+/* CPUID Leaf 0x1E constants: */
+#define INTEL_AMX_TMUL_MAX_K   0x10
+#define INTEL_AMX_TMUL_MAX_N   0x40
+
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
   uint32_t vendor2, uint32_t vendor3)
 {
@@ -843,8 +855,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "avx512-vp2intersect", NULL, "md-clear", NULL,
 NULL, NULL, "serialize", NULL,
 "tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
-NULL, NULL, NULL, "avx512-fp16",
-NULL, NULL, "spec-ctrl", "stibp",
+NULL, NULL, "amx-bf16", "avx512-fp16",
+"amx-tile", "amx-int8", "spec-ctrl", "stibp",
 NULL, "arch-capabilities", "core-capability", "ssbd",
 },
 .cpuid = {
@@ -909,7 +921,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 "xsaveopt", "xsavec", "xgetbv1", "xsaves",
-NULL, NULL, NULL, NULL,
+"xfd", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -5584,6 +5596,43 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 break;
 }
+case 0x1D: {
+/* AMX TILE */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*eax = INTEL_AMX_TILE_MAX_SUBLEAF;
+} else if (count == 1) {
+*eax = INTEL_AMX_TOTAL_TILE_BYTES |
+   (INTEL_AMX_BYTES_PER_TILE << 16);
+*ebx = INTEL_AMX_BYTES_PER_ROW | (INTEL_AMX_TILE_MAX_NAMES << 16);
+*ecx = INTEL_AMX_TILE_MAX_ROWS;
+}
+break;
+}
+case 0x1E: {
+/* AMX TMUL */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+break;
+}
+
+if (count == 0) {
+/* Highest numbered palette subleaf */
+*ebx = INTEL_AMX_TMUL_MAX_K | (INTEL_AMX_TMUL_MAX_N << 8);
+}
+break;
+}
 case 0x4000:
 /*
  * CPUID code in kvm_arch_init_vcpu() ignores stuff
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 13f8e30c2a..3fb3ddbe2b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1758,7 +1758,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c = &cpuid_data.entries[cpuid_i++];
 }
 break;
-case 0x14: {
+case 0x14:
+case 0x1d: {
 uint32_t times;
 
 c->function = i;

Re: unable to execute QEMU command 'qom-get': Property 'sgx-epc.unavailable-features' not found

2021-11-30 Thread Yang Zhong

On Thu, Nov 25, 2021 at 08:47:22PM +0800, Yang Zhong wrote:
> Hello Paolo,
> 
> Our customer used the Libvirt XML to start a SGX VM, but failed.
> 
> libvirt.libvirtError: internal error: unable to execute QEMU command 
> 'qom-get': Property 'sgx-epc.unavailable-features' not found
> 
> The XML file,
> 
> 
>  value="host,+sgx,+sgx-debug,+sgx-exinfo,+sgx-kss,+sgx-mode64,+sgx-provisionkey,+sgx-tokenkey,+sgx1,+sgx2,+sgxlc"/>
> 
> 
> 
> 
>   
> 
> The new compound property command should be located in /machine path,
> which are different with old command '-sgx-epc id=epc1,memdev=mem1'.
> 
> I also tried this from Qemu monitor tool, 
> (qemu) qom-list /machine
> type (string)
> kernel (string)
> ..
> sgx-epc (SgxEPC)
> ..
> sgx-epc[0] (child)
> ..
> 
> We can find sgx-epc from /machine list.
> 

  This issue is clear now, which is caused by Libvirt to get the CPU's 
unavailable-features by below command:
  
{"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"}

  but in SGX vm, since the sgx is initialized before VCPU because sgx need set 
the virtual EPC info in the cpuid.  

  So the /machine/unattached/device[0] is occupied by sgx, which fail to get 
the unvailable-features from
  /machine/unattached/device[0].


  We need fix this issue, but this can be done in Qemu or Libvirt side.

  1) Libvirt side
 If the libvirt support SGX EPCs, libvirt can use 
/machine/unattached/device[n] to check "unavailable-features".
 n is the next number of sgx's unattached_count.

  2) Qemu side

 One temp patch to create one /sgx in the /machine in the 
device_set_realized() 
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 84f3019440..4154eef0d8 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -497,7 +497,7 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
 NamedClockList *ncl;
 Error *local_err = NULL;
 bool unattached_parent = false;
-static int unattached_count;
+static int unattached_count, sgx_count;

 if (dev->hotplugged && !dc->hotpluggable) {
 error_setg(errp, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
@@ -509,7 +509,15 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
 goto fail;
 }

-if (!obj->parent) {
+if (!obj->parent && !strcmp(object_get_typename(obj), "sgx-epc")) {
+gchar *name = g_strdup_printf("device[%d]", sgx_count++);
+
+object_property_add_child(container_get(qdev_get_machine(),
+"/sgx"),
+  name, obj);
+unattached_parent = true;
+g_free(name);
+} else if (!obj->parent) {
 gchar *name = g_strdup_printf("device[%d]", unattached_count++);

 object_property_add_child(container_get(qdev_get_machine()
   
This patch can make sure vcpu is still /machine/unattached/device[0].


Which solution is best?  thanks!

Yang




> I am not familiar with Libvirt side, would you please suggest how to implement
> this compound command in the XML file?  thanks a lot!
> 
> Regards,
> 
> Yang  
>

unable to execute QEMU command 'qom-get': Property 'sgx-epc.unavailable-features' not found

2021-11-25 Thread Yang Zhong

Hello Paolo,

Our customer used the Libvirt XML to start a SGX VM, but failed.

libvirt.libvirtError: internal error: unable to execute QEMU command 'qom-get': 
Property 'sgx-epc.unavailable-features' not found

The XML file,







  

The new compound property command should be located in /machine path,
which are different with old command '-sgx-epc id=epc1,memdev=mem1'.

I also tried this from Qemu monitor tool, 
(qemu) qom-list /machine
type (string)
kernel (string)
..
sgx-epc (SgxEPC)
..
sgx-epc[0] (child)
..

We can find sgx-epc from /machine list.

I am not familiar with Libvirt side, would you please suggest how to implement
this compound command in the XML file?  thanks a lot!

Regards,

Yang

Re: [PATCH v3 3/5] numa: Support SGX numa in the monitor and Libvirt interfaces

2021-11-11 Thread Yang Zhong

On Thu, Nov 11, 2021 at 08:55:35AM +0100, Philippe Mathieu-Daudé wrote:
> On 11/11/21 07:18, Yang Zhong wrote:
> > On Wed, Nov 10, 2021 at 10:55:40AM -0600, Eric Blake wrote:
> >> On Mon, Nov 01, 2021 at 12:20:07PM -0400, Yang Zhong wrote:
> >>> Add the SGXEPCSection list into SGXInfo to show the multiple
> >>> SGX EPC sections detailed info, not the total size like before.
> >>> This patch can enable numa support for 'info sgx' command and
> >>> QMP interfaces. The new interfaces show each EPC section info
> >>> in one numa node. Libvirt can use QMP interface to get the
> >>> detailed host SGX EPC capabilities to decide how to allocate
> >>> host EPC sections to guest.
> >>>
> >>> (qemu) info sgx
> >>>  SGX support: enabled
> >>>  SGX1 support: enabled
> >>>  SGX2 support: enabled
> >>>  FLC support: enabled
> >>>  NUMA node #0: size=67108864
> >>>  NUMA node #1: size=29360128
> >>>
> >>> The QMP interface show:
> >>> (QEMU) query-sgx
> >>> {"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
> >>> [{"node": 0, "size": 67108864}, {"node": 1, "size": 29360128}], "flc": 
> >>> true}}
> >>>
> >>> (QEMU) query-sgx-capabilities
> >>> {"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
> >>> [{"node": 0, "size": 17070817280}, {"node": 1, "size": 17079205888}], 
> >>> "flc": true}}
> >>
> >> Other than the different "size" values, how do these commands differ?
> > 
> > 
> >   As for QMP interfaces,
> >   The 'query-sgx' to get VM sgx detailed info, and 'query-sgx-capabilities' 
> > to get
> >   the host sgx capabilities and Libvirt can use this info to decide how to 
> > allocate
> >   virtual EPC sections to VMs.
> 
> What about renaming/aliasing as 'query-host-sgx' / 'query-guest-sgx'?


  The current Libvirt and Qemu's QMP interface define all interfaces like those 
naming
  rule. If we change those names, there are lots of work in the Qemu and 
Libvirt sides.

  Thanks!

  Yang

Re: [PATCH v3 0/5] SGX NUMA support plus vepc reset

2021-11-10 Thread Yang Zhong

On Wed, Nov 10, 2021 at 05:07:40PM +0100, Paolo Bonzini wrote:
> On 11/10/21 13:56, Yang Zhong wrote:
> >   Paolo, thanks!
> >
> >   No other maintainers to review numa patches, so i will send the numa
> >   patches again? thanks!
> 
> The patch look good, but they were sent too close to the 6.2 release
> freeze.  I'll take a look at them again a month.
>

  Thanks Paolo, once the new Qemu version is ready, i will send V4. thanks!

  Regards,

  Yang
 
> Paolo

Re: [PATCH v3 3/5] numa: Support SGX numa in the monitor and Libvirt interfaces

2021-11-10 Thread Yang Zhong

On Wed, Nov 10, 2021 at 10:55:40AM -0600, Eric Blake wrote:
> On Mon, Nov 01, 2021 at 12:20:07PM -0400, Yang Zhong wrote:
> > Add the SGXEPCSection list into SGXInfo to show the multiple
> > SGX EPC sections detailed info, not the total size like before.
> > This patch can enable numa support for 'info sgx' command and
> > QMP interfaces. The new interfaces show each EPC section info
> > in one numa node. Libvirt can use QMP interface to get the
> > detailed host SGX EPC capabilities to decide how to allocate
> > host EPC sections to guest.
> > 
> > (qemu) info sgx
> >  SGX support: enabled
> >  SGX1 support: enabled
> >  SGX2 support: enabled
> >  FLC support: enabled
> >  NUMA node #0: size=67108864
> >  NUMA node #1: size=29360128
> > 
> > The QMP interface show:
> > (QEMU) query-sgx
> > {"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
> > [{"node": 0, "size": 67108864}, {"node": 1, "size": 29360128}], "flc": 
> > true}}
> > 
> > (QEMU) query-sgx-capabilities
> > {"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
> > [{"node": 0, "size": 17070817280}, {"node": 1, "size": 17079205888}], 
> > "flc": true}}
> 
> Other than the different "size" values, how do these commands differ?


  As for QMP interfaces,
  The 'query-sgx' to get VM sgx detailed info, and 'query-sgx-capabilities' to 
get
  the host sgx capabilities and Libvirt can use this info to decide how to 
allocate
  virtual EPC sections to VMs.

  'info sgx' and 'query-sgx' are same functions, only different interfaces, HMP 
and QMP.

  Yang


> 
> > 
> > Signed-off-by: Yang Zhong 
> > ---
> >  qapi/misc-target.json | 19 ++--
> >  hw/i386/sgx.c | 51 +++
> >  2 files changed, 59 insertions(+), 11 deletions(-)
> > 
> > diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> > index 5aa2b95b7d..1022aa0184 100644
> > --- a/qapi/misc-target.json
> > +++ b/qapi/misc-target.json
> > @@ -337,6 +337,21 @@
> >'if': 'TARGET_ARM' }
> >  
> >  
> > +##
> > +# @SGXEPCSection:
> > +#
> > +# Information about intel SGX EPC section info
> > +#
> > +# @node: the numa node
> > +#
> > +# @size: the size of epc section
> > +#
> > +# Since: 6.2
> 
> Are we still trying to cram this into 6.2, or is now slipping into 7.0?


  The numa patches will be merged into Qemu next version, once the new version 
is
  ready, i will change this and send V4. thanks!

  Yang 


> 
> > +##
> > +{ 'struct': 'SGXEPCSection',
> > +  'data': { 'node': 'int',
> > +'size': 'uint64'}}
> > +
> >  ##
> >  # @SGXInfo:
> >  #
> > @@ -350,7 +365,7 @@
> >  #
> >  # @flc: true if FLC is supported
> >  #
> > -# @section-size: The EPC section size for guest
> > +# @sections: The EPC sections info for guest
> >  #
> >  # Since: 6.2
> >  ##
> > @@ -359,7 +374,7 @@
> >  'sgx1': 'bool',
> >  'sgx2': 'bool',
> >  'flc': 'bool',
> > -'section-size': 'uint64'},
> > +'sections': ['SGXEPCSection']},
> 
> This would be an incompatible change.  As long as 6.2 isn't released,
> we can do that; but once it is, we need to be more careful about
> changing the QMP spec.


  Thanks for reminder! I had to use SGXEPCSection lists to show those epc
  infos because the MAX SGX epc section number is 8, this number in each
  server maybe different, not one fixed number. Once the new QMP spec release,
  let me check how to adjust this. thanks!

  Regards,

  Yang





> 
> > 'if': 'TARGET_I386' }
> >  
> >  ##
> > diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> > index 9a77519609..b5b710a556 100644
> > --- a/hw/i386/sgx.c
> > +++ b/hw/i386/sgx.c
> > @@ -76,11 +76,13 @@ static uint64_t sgx_calc_section_metric(uint64_t low, 
> > uint64_t high)
> > ((high & MAKE_64BIT_MASK(0, 20)) << 32);
> >  }
> >  
> > -static uint64_t sgx_calc_host_epc_section_size(void)
> > +static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
> ...
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

1 2 3 4 5 6 7 8 >

1 - 100 of 708 matches

Mail list logo