Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh



On 9/6/18 1:50 PM, Brijesh Singh wrote:
...

>>
>> #define HVC_DECRYPTED_ARRAY_SIZE  \
>> PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
>>    sizeof(struct pvclock_vsyscall_time_info))
>>
>

Since the hv_clock_aux array will have NR_CPUS elements hence the
following definition is all we need.

#ifdef CONFIG_AMD_MEM_ENCRYPT
static struct pvclock_vsyscall_time_info
                            hv_clock_aux[NR_CPUS] __decrypted_aux;
#endif


> Sure works for me.
>
>>> +static struct pvclock_vsyscall_time_info
>>> +    hv_clock_dec[HVC_DECRYPTED_ARRAY_SIZE]
>>> __decrypted_hvclock;
>>> +



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh



On 9/6/18 1:50 PM, Brijesh Singh wrote:
...

>>
>> #define HVC_DECRYPTED_ARRAY_SIZE  \
>> PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
>>    sizeof(struct pvclock_vsyscall_time_info))
>>
>

Since the hv_clock_aux array will have NR_CPUS elements hence the
following definition is all we need.

#ifdef CONFIG_AMD_MEM_ENCRYPT
static struct pvclock_vsyscall_time_info
                            hv_clock_aux[NR_CPUS] __decrypted_aux;
#endif


> Sure works for me.
>
>>> +static struct pvclock_vsyscall_time_info
>>> +    hv_clock_dec[HVC_DECRYPTED_ARRAY_SIZE]
>>> __decrypted_hvclock;
>>> +



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 03:39 PM, Sean Christopherson wrote:

On Thu, Sep 06, 2018 at 03:20:46PM -0500, Brijesh Singh wrote:



On 09/06/2018 02:47 PM, Sean Christopherson wrote:
...



Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
the sizeof operators are not allowed in '#if'. Anyway, I will try to see
if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
check.


Hmm, we'll need something otherwise 'NR_CPUS - HVC_BOOT_ARRAY_SIZE'
will wrap and cause build errors.




Right.

One option is we can hard-code the check for > 64, something like this:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > 64
...
...
#endif

But this assumption will break if we ever add a new field in
struct pvclock_vsyscall_time_info. Hence I am not in favor of this.

Second option is, use KVM_MAX_VCPUS or NR_CPUS, something like this:

#ifdef CONFIG_AMD_MEM_ENCRYPT
 #define HVC_AUX_ARRAY_SIZE \
 PAGE_ALIGN(NR_CPUS * sizeof(struct pvclock_vsyscall_time_info))
...
#endif

In this case we will allocate few extra bytes which will get freed for
the non-SEV case anyways.





Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 03:39 PM, Sean Christopherson wrote:

On Thu, Sep 06, 2018 at 03:20:46PM -0500, Brijesh Singh wrote:



On 09/06/2018 02:47 PM, Sean Christopherson wrote:
...



Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
the sizeof operators are not allowed in '#if'. Anyway, I will try to see
if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
check.


Hmm, we'll need something otherwise 'NR_CPUS - HVC_BOOT_ARRAY_SIZE'
will wrap and cause build errors.




Right.

One option is we can hard-code the check for > 64, something like this:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > 64
...
...
#endif

But this assumption will break if we ever add a new field in
struct pvclock_vsyscall_time_info. Hence I am not in favor of this.

Second option is, use KVM_MAX_VCPUS or NR_CPUS, something like this:

#ifdef CONFIG_AMD_MEM_ENCRYPT
 #define HVC_AUX_ARRAY_SIZE \
 PAGE_ALIGN(NR_CPUS * sizeof(struct pvclock_vsyscall_time_info))
...
#endif

In this case we will allocate few extra bytes which will get freed for
the non-SEV case anyways.





Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 03:20:46PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 02:47 PM, Sean Christopherson wrote:
> ...
> 
> >>
> >>Yes, the auxiliary array will dumped into the regular .bss when
> >>CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> >>sure if its worth complicating the code to save those extra memory.
> >>Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.
> >
> >I just realized that we'll try to create a bogus array if 'NR_CPUS <=
> >HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
> >and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:
> >
> >#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
> >#define HVC_AUX_ARRAY_SIZE  \
> > PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
> >sizeof(struct pvclock_vsyscall_time_info))
> >static struct pvclock_vsyscall_time_info
> > hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
> >#endif
> >
> 
> The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
> the sizeof operators are not allowed in '#if'. Anyway, I will try to see
> if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
> check.

Hmm, we'll need something otherwise 'NR_CPUS - HVC_BOOT_ARRAY_SIZE'
will wrap and cause build errors.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 03:20:46PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 02:47 PM, Sean Christopherson wrote:
> ...
> 
> >>
> >>Yes, the auxiliary array will dumped into the regular .bss when
> >>CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> >>sure if its worth complicating the code to save those extra memory.
> >>Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.
> >
> >I just realized that we'll try to create a bogus array if 'NR_CPUS <=
> >HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
> >and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:
> >
> >#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
> >#define HVC_AUX_ARRAY_SIZE  \
> > PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
> >sizeof(struct pvclock_vsyscall_time_info))
> >static struct pvclock_vsyscall_time_info
> > hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
> >#endif
> >
> 
> The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
> the sizeof operators are not allowed in '#if'. Anyway, I will try to see
> if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
> check.

Hmm, we'll need something otherwise 'NR_CPUS - HVC_BOOT_ARRAY_SIZE'
will wrap and cause build errors.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:47 PM, Sean Christopherson wrote:
...



Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
the sizeof operators are not allowed in '#if'. Anyway, I will try to see
if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
check.

-Brijesh


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:47 PM, Sean Christopherson wrote:
...



Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
the sizeof operators are not allowed in '#if'. Anyway, I will try to see
if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
check.

-Brijesh


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:24 PM, Brijesh Singh wrote:

...



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


We can use #ifdef CONFIG_AMD_MEM_ENCRYPT around hv_clock_aux definition
so that it gets defined

Something like this:

#ifdef CONFIG_AMD_MEM_ENCRYPT
/* The auxilary array will be used when SEV is active */
#define HVC_AUX_ARRAY_SIZE \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted_aux;
#endif



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:24 PM, Brijesh Singh wrote:

...



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


We can use #ifdef CONFIG_AMD_MEM_ENCRYPT around hv_clock_aux definition
so that it gets defined

Something like this:

#ifdef CONFIG_AMD_MEM_ENCRYPT
/* The auxilary array will be used when SEV is active */
#define HVC_AUX_ARRAY_SIZE \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted_aux;
#endif



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 02:24:32PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 01:47 PM, Sean Christopherson wrote:
> >On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:
> >>
> >>
> >>On 09/06/2018 09:18 AM, Sean Christopherson wrote:
> >>
> >>
> >
> >So are we going to be defining a decrypted section for every piece of
> >machinery now?
> >
> >That's a bit too much in my book.
> >
> >Why can't you simply free everything in .data..decrypted on !SVE guests?
> 
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark
> data that is used if and only if SEV is active, and then free the
> SEV-only data when SEV is disabled.
> >>>
> >>>Oh, and we'd need to make sure __decrypted_exclusive is freed when
> >>>!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> >>>array is used only if SEV is active.  This patch unconditionally
> >>>defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> >>>!mem_encrypt_active().
> >>>
> >>
> >>Again we have to consider the bare metal scenario while doing this. The
> >>aux array you proposed will be added in decrypted section only when
> >>CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
> >>gets put in .data.decrypted section. At the runtime, if memory
> >>encryption is active then .data.decrypted_hvclock will contains useful
> >>data.
> >>
> >>The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.
> >
> >Right, but won't the data get dumped into the regular .bss in that
> >case, i.e. needs to be freed?
> >
> 
> 
> Yes, the auxiliary array will dumped into the regular .bss when
> CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> sure if its worth complicating the code to save those extra memory.
> Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.

I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 02:24:32PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 01:47 PM, Sean Christopherson wrote:
> >On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:
> >>
> >>
> >>On 09/06/2018 09:18 AM, Sean Christopherson wrote:
> >>
> >>
> >
> >So are we going to be defining a decrypted section for every piece of
> >machinery now?
> >
> >That's a bit too much in my book.
> >
> >Why can't you simply free everything in .data..decrypted on !SVE guests?
> 
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark
> data that is used if and only if SEV is active, and then free the
> SEV-only data when SEV is disabled.
> >>>
> >>>Oh, and we'd need to make sure __decrypted_exclusive is freed when
> >>>!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> >>>array is used only if SEV is active.  This patch unconditionally
> >>>defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> >>>!mem_encrypt_active().
> >>>
> >>
> >>Again we have to consider the bare metal scenario while doing this. The
> >>aux array you proposed will be added in decrypted section only when
> >>CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
> >>gets put in .data.decrypted section. At the runtime, if memory
> >>encryption is active then .data.decrypted_hvclock will contains useful
> >>data.
> >>
> >>The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.
> >
> >Right, but won't the data get dumped into the regular .bss in that
> >case, i.e. needs to be freed?
> >
> 
> 
> Yes, the auxiliary array will dumped into the regular .bss when
> CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> sure if its worth complicating the code to save those extra memory.
> Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.

I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 01:47 PM, Sean Christopherson wrote:

On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:



On 09/06/2018 09:18 AM, Sean Christopherson wrote:




So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.


Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 01:47 PM, Sean Christopherson wrote:

On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:



On 09/06/2018 09:18 AM, Sean Christopherson wrote:




So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.


Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 11:45:03AM -0700, Sean Christopherson wrote:
> Yep, though because the 4k chunk in __decrypted is @hv_clock_boot 
> that's used for cpus 0-127, __decrypted_XXX would effectively be:
> 
>(((num_possible_cpus() * 32) / 4k) - 1) pages

Ok, sounds like a nice compromise to me.

Also, I wonder if using subsections would be even better when adding
other things to the decrypted section. I.e.,

.data..decrypted:

...

.data..decrypted.aux:

...

.data..decrypted.something_else:

and this way keep it still conceptually together by keeping the section
namespace clean because we're putting it all under .decrypted's
namespace.

Hmmm.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 11:45:03AM -0700, Sean Christopherson wrote:
> Yep, though because the 4k chunk in __decrypted is @hv_clock_boot 
> that's used for cpus 0-127, __decrypted_XXX would effectively be:
> 
>(((num_possible_cpus() * 32) / 4k) - 1) pages

Ok, sounds like a nice compromise to me.

Also, I wonder if using subsections would be even better when adding
other things to the decrypted section. I.e.,

.data..decrypted:

...

.data..decrypted.aux:

...

.data..decrypted.something_else:

and this way keep it still conceptually together by keeping the section
namespace clean because we're putting it all under .decrypted's
namespace.

Hmmm.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 09:07 AM, Sean Christopherson wrote:
...


+
+/* This should cover upto 512 VCPUS (first 64 are covered by hv_clock_boot[]). 
*/
+#define HVC_DECRYPTED_ARRAY_SIZE \
+   ((PAGE_SIZE * 7)  / sizeof(struct pvclock_vsyscall_time_info))


I think we can define the size relative to NR_CPUS rather than picking
an arbitrary number of pages, maybe with a BUILD_BUG_ON to make sure
the total size won't require a second 2mb page for __decrpyted.

#define HVC_DECRYPTED_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))



Sure works for me.


+static struct pvclock_vsyscall_time_info
+   hv_clock_dec[HVC_DECRYPTED_ARRAY_SIZE] 
__decrypted_hvclock;
+
  static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
  {
return _cpu_read(hv_clock_per_cpu)->pvti;
@@ -267,10 +274,19 @@ static int kvmclock_setup_percpu(unsigned int cpu)
return 0;
  
  	/* Use the static page for the first CPUs, allocate otherwise */

-   if (cpu < HVC_BOOT_ARRAY_SIZE)
+   if (cpu < HVC_BOOT_ARRAY_SIZE) {
p = _clock_boot[cpu];
-   else
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   } else {
+   /*
+* When SEV is active, use the static pages from
+* .data..decrypted_hvclock section. The pages are already
+* mapped with C=0.
+*/
+   if (sev_active())
+   p = _clock_dec[cpu - HVC_BOOT_ARRAY_SIZE];
+   else
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   }


Personal preference, but I think an if-elif-else with a single block
comment would be easier to read.



I can do with that. thanks for your feedback.




/*
 * Blah blah blah
 */
if (cpu < HVC_BOOT_ARRAY_SIZE)
p = _clock_boot[cpu];
else if (sev_active())
p = _clock_dec[cpu - HVC_BOOT_ARRAY_SIZE];
else
p = kzalloc(sizeof(*p), GFP_KERNEL);



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 09:07 AM, Sean Christopherson wrote:
...


+
+/* This should cover upto 512 VCPUS (first 64 are covered by hv_clock_boot[]). 
*/
+#define HVC_DECRYPTED_ARRAY_SIZE \
+   ((PAGE_SIZE * 7)  / sizeof(struct pvclock_vsyscall_time_info))


I think we can define the size relative to NR_CPUS rather than picking
an arbitrary number of pages, maybe with a BUILD_BUG_ON to make sure
the total size won't require a second 2mb page for __decrpyted.

#define HVC_DECRYPTED_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))



Sure works for me.


+static struct pvclock_vsyscall_time_info
+   hv_clock_dec[HVC_DECRYPTED_ARRAY_SIZE] 
__decrypted_hvclock;
+
  static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
  {
return _cpu_read(hv_clock_per_cpu)->pvti;
@@ -267,10 +274,19 @@ static int kvmclock_setup_percpu(unsigned int cpu)
return 0;
  
  	/* Use the static page for the first CPUs, allocate otherwise */

-   if (cpu < HVC_BOOT_ARRAY_SIZE)
+   if (cpu < HVC_BOOT_ARRAY_SIZE) {
p = _clock_boot[cpu];
-   else
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   } else {
+   /*
+* When SEV is active, use the static pages from
+* .data..decrypted_hvclock section. The pages are already
+* mapped with C=0.
+*/
+   if (sev_active())
+   p = _clock_dec[cpu - HVC_BOOT_ARRAY_SIZE];
+   else
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   }


Personal preference, but I think an if-elif-else with a single block
comment would be easier to read.



I can do with that. thanks for your feedback.




/*
 * Blah blah blah
 */
if (cpu < HVC_BOOT_ARRAY_SIZE)
p = _clock_boot[cpu];
else if (sev_active())
p = _clock_dec[cpu - HVC_BOOT_ARRAY_SIZE];
else
p = kzalloc(sizeof(*p), GFP_KERNEL);



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 09:18 AM, Sean Christopherson wrote:
> 
> 
> >>>
> >>>So are we going to be defining a decrypted section for every piece of
> >>>machinery now?
> >>>
> >>>That's a bit too much in my book.
> >>>
> >>>Why can't you simply free everything in .data..decrypted on !SVE guests?
> >>
> >>That would prevent adding __decrypted to existing declarations, e.g.
> >>hv_clock_boot, which would be ugly in its own right.  A more generic
> >>solution would be to add something like __decrypted_exclusive to mark
> >>data that is used if and only if SEV is active, and then free the
> >>SEV-only data when SEV is disabled.
> >
> >Oh, and we'd need to make sure __decrypted_exclusive is freed when
> >!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> >array is used only if SEV is active.  This patch unconditionally
> >defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> >!mem_encrypt_active().
> >
> 
> Again we have to consider the bare metal scenario while doing this. The
> aux array you proposed will be added in decrypted section only when
> CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
> gets put in .data.decrypted section. At the runtime, if memory
> encryption is active then .data.decrypted_hvclock will contains useful
> data.
> 
> The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.

Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 09:18 AM, Sean Christopherson wrote:
> 
> 
> >>>
> >>>So are we going to be defining a decrypted section for every piece of
> >>>machinery now?
> >>>
> >>>That's a bit too much in my book.
> >>>
> >>>Why can't you simply free everything in .data..decrypted on !SVE guests?
> >>
> >>That would prevent adding __decrypted to existing declarations, e.g.
> >>hv_clock_boot, which would be ugly in its own right.  A more generic
> >>solution would be to add something like __decrypted_exclusive to mark
> >>data that is used if and only if SEV is active, and then free the
> >>SEV-only data when SEV is disabled.
> >
> >Oh, and we'd need to make sure __decrypted_exclusive is freed when
> >!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> >array is used only if SEV is active.  This patch unconditionally
> >defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> >!mem_encrypt_active().
> >
> 
> Again we have to consider the bare metal scenario while doing this. The
> aux array you proposed will be added in decrypted section only when
> CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
> gets put in .data.decrypted section. At the runtime, if memory
> encryption is active then .data.decrypted_hvclock will contains useful
> data.
> 
> The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.

Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 08:33:34PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 08:54:52AM -0700, Sean Christopherson wrote:
> > My thought was that we could simply define a second array for the SEV
> > case to statically allocate for NR_CPUS since __decrypted has a big
> > chunk of memory that would be ununsed anyways[1]. And since the second
> > array is only used for SEV it can be freed if !SEV.
> 
> Lemme see if I get it straight:
> 
> __decrypted:
> 
>  4K
> 
> __decrypted_XXX:
> 
>  ((num_possible_cpus() * 32) / 4K) pages
> 
> __decrypted_end:
> 
> Am I close?

Yep, though because the 4k chunk in __decrypted is @hv_clock_boot 
that's used for cpus 0-127, __decrypted_XXX would effectively be:

   (((num_possible_cpus() * 32) / 4k) - 1) pages


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 08:33:34PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 08:54:52AM -0700, Sean Christopherson wrote:
> > My thought was that we could simply define a second array for the SEV
> > case to statically allocate for NR_CPUS since __decrypted has a big
> > chunk of memory that would be ununsed anyways[1]. And since the second
> > array is only used for SEV it can be freed if !SEV.
> 
> Lemme see if I get it straight:
> 
> __decrypted:
> 
>  4K
> 
> __decrypted_XXX:
> 
>  ((num_possible_cpus() * 32) / 4K) pages
> 
> __decrypted_end:
> 
> Am I close?

Yep, though because the 4k chunk in __decrypted is @hv_clock_boot 
that's used for cpus 0-127, __decrypted_XXX would effectively be:

   (((num_possible_cpus() * 32) / 4k) - 1) pages


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 01:33 PM, Borislav Petkov wrote:

On Thu, Sep 06, 2018 at 08:54:52AM -0700, Sean Christopherson wrote:

My thought was that we could simply define a second array for the SEV
case to statically allocate for NR_CPUS since __decrypted has a big
chunk of memory that would be ununsed anyways[1]. And since the second
array is only used for SEV it can be freed if !SEV.


Lemme see if I get it straight:

__decrypted:

  4K

__decrypted_XXX:

  ((num_possible_cpus() * 32) / 4K) pages

__decrypted_end:

Am I close?



Yes.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 01:33 PM, Borislav Petkov wrote:

On Thu, Sep 06, 2018 at 08:54:52AM -0700, Sean Christopherson wrote:

My thought was that we could simply define a second array for the SEV
case to statically allocate for NR_CPUS since __decrypted has a big
chunk of memory that would be ununsed anyways[1]. And since the second
array is only used for SEV it can be freed if !SEV.


Lemme see if I get it straight:

__decrypted:

  4K

__decrypted_XXX:

  ((num_possible_cpus() * 32) / 4K) pages

__decrypted_end:

Am I close?



Yes.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 09:18 AM, Sean Christopherson wrote:




So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.


Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


-Brijesh


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 09:18 AM, Sean Christopherson wrote:




So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.


Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


-Brijesh


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 08:54:52AM -0700, Sean Christopherson wrote:
> My thought was that we could simply define a second array for the SEV
> case to statically allocate for NR_CPUS since __decrypted has a big
> chunk of memory that would be ununsed anyways[1]. And since the second
> array is only used for SEV it can be freed if !SEV.

Lemme see if I get it straight:

__decrypted:

 4K

__decrypted_XXX:

 ((num_possible_cpus() * 32) / 4K) pages

__decrypted_end:

Am I close?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 08:54:52AM -0700, Sean Christopherson wrote:
> My thought was that we could simply define a second array for the SEV
> case to statically allocate for NR_CPUS since __decrypted has a big
> chunk of memory that would be ununsed anyways[1]. And since the second
> array is only used for SEV it can be freed if !SEV.

Lemme see if I get it straight:

__decrypted:

 4K

__decrypted_XXX:

 ((num_possible_cpus() * 32) / 4K) pages

__decrypted_end:

Am I close?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 08:50 AM, Sean Christopherson wrote:
...



So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.

Originally, my thought was that this would be a one-off case and the
array could be freed directly in kvmclock_init(), e.g.:




Please note that kvmclock_init() is called very early during the boot
process. We will not be able to use free_init_pages(...) so early.
Additionally, we also need to consider the bare-metal case -- which
will never call the kvmclock_init().




static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);

...

void __init kvmclock_init(void)
{
u8 flags;

if (!sev_active())
free_init_pages("unused decrypted",
(unsigned long)hv_clock_aux,
(unsigned long)hv_clock_aux + sizeof(hv_clock_aux));





Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 08:50 AM, Sean Christopherson wrote:
...



So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.

Originally, my thought was that this would be a one-off case and the
array could be freed directly in kvmclock_init(), e.g.:




Please note that kvmclock_init() is called very early during the boot
process. We will not be able to use free_init_pages(...) so early.
Additionally, we also need to consider the bare-metal case -- which
will never call the kvmclock_init().




static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);

...

void __init kvmclock_init(void)
{
u8 flags;

if (!sev_active())
free_init_pages("unused decrypted",
(unsigned long)hv_clock_aux,
(unsigned long)hv_clock_aux + sizeof(hv_clock_aux));





Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 05:19:38PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 07:56:40AM -0700, Sean Christopherson wrote:
> > Wouldn't that result in @hv_clock_boot being incorrectly freed in the
> > !SEV case?
> 
> Ok, maybe I'm missing something but why do we need 4K per CPU? Why can't
> we map all those pages which contain the clock variable, decrypted in
> all guests' page tables?
> 
> Basically
> 
> (NR_CPUS * sizeof(struct pvclock_vsyscall_time_info)) / 4096
> 
> pages.
> 
> For the !SEV case then nothing changes.

The 4k per CPU refers to the dynamic allocation in Brijesh's original
patch.   Currently, @hv_clock_boot is a single 4k page to limit the
amount of unused memory when 'nr_cpu_ids < NR_CPUS'.  In the SEV case,
dynamically allocating for 'cpu > HVC_BOOT_ARRAY_SIZE' one at a time
means that each CPU allocates a full 4k page to store a single 32-byte
variable.  My thought was that we could simply define a second array
for the SEV case to statically allocate for NR_CPUS since __decrypted
has a big chunk of memory that would be ununsed anyways[1].  And since
the second array is only used for SEV it can be freed if !SEV.

If we free the array explicitly then we don't need a second section or
attribute.  My comments about __decrypted_exclusive were that if we
did want to go with a second section/attribute, e.g. to have a generic
solution that can be used for other stuff, then we'd have more corner
cases to deal with.  I agree that simpler is better, i.e. I'd vote for
explicitly freeing the second array.  Apologies for not making that
clear from the get-go. 

[1] An alternative solution would be to batch the dynamic allocations,
but that would probably require locking and be more complex.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 05:19:38PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 07:56:40AM -0700, Sean Christopherson wrote:
> > Wouldn't that result in @hv_clock_boot being incorrectly freed in the
> > !SEV case?
> 
> Ok, maybe I'm missing something but why do we need 4K per CPU? Why can't
> we map all those pages which contain the clock variable, decrypted in
> all guests' page tables?
> 
> Basically
> 
> (NR_CPUS * sizeof(struct pvclock_vsyscall_time_info)) / 4096
> 
> pages.
> 
> For the !SEV case then nothing changes.

The 4k per CPU refers to the dynamic allocation in Brijesh's original
patch.   Currently, @hv_clock_boot is a single 4k page to limit the
amount of unused memory when 'nr_cpu_ids < NR_CPUS'.  In the SEV case,
dynamically allocating for 'cpu > HVC_BOOT_ARRAY_SIZE' one at a time
means that each CPU allocates a full 4k page to store a single 32-byte
variable.  My thought was that we could simply define a second array
for the SEV case to statically allocate for NR_CPUS since __decrypted
has a big chunk of memory that would be ununsed anyways[1].  And since
the second array is only used for SEV it can be freed if !SEV.

If we free the array explicitly then we don't need a second section or
attribute.  My comments about __decrypted_exclusive were that if we
did want to go with a second section/attribute, e.g. to have a generic
solution that can be used for other stuff, then we'd have more corner
cases to deal with.  I agree that simpler is better, i.e. I'd vote for
explicitly freeing the second array.  Apologies for not making that
clear from the get-go. 

[1] An alternative solution would be to batch the dynamic allocations,
but that would probably require locking and be more complex.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 07:56:40AM -0700, Sean Christopherson wrote:
> Wouldn't that result in @hv_clock_boot being incorrectly freed in the
> !SEV case?

Ok, maybe I'm missing something but why do we need 4K per CPU? Why can't
we map all those pages which contain the clock variable, decrypted in
all guests' page tables?

Basically

(NR_CPUS * sizeof(struct pvclock_vsyscall_time_info)) / 4096

pages.

For the !SEV case then nothing changes.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 07:56:40AM -0700, Sean Christopherson wrote:
> Wouldn't that result in @hv_clock_boot being incorrectly freed in the
> !SEV case?

Ok, maybe I'm missing something but why do we need 4K per CPU? Why can't
we map all those pages which contain the clock variable, decrypted in
all guests' page tables?

Basically

(NR_CPUS * sizeof(struct pvclock_vsyscall_time_info)) / 4096

pages.

For the !SEV case then nothing changes.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 04:43:42PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 06:50:41AM -0700, Sean Christopherson wrote:
> > That would prevent adding __decrypted to existing declarations, e.g.
> > hv_clock_boot, which would be ugly in its own right.  A more generic
> > solution would be to add something like __decrypted_exclusive to mark
> 
> I still don't understand why can't there be only a single __decrypted
> section and to free that whole section on !SEV.

Wouldn't that result in @hv_clock_boot being incorrectly freed in the
!SEV case?


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 04:43:42PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 06:50:41AM -0700, Sean Christopherson wrote:
> > That would prevent adding __decrypted to existing declarations, e.g.
> > hv_clock_boot, which would be ugly in its own right.  A more generic
> > solution would be to add something like __decrypted_exclusive to mark
> 
> I still don't understand why can't there be only a single __decrypted
> section and to free that whole section on !SEV.

Wouldn't that result in @hv_clock_boot being incorrectly freed in the
!SEV case?


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 07:18:25AM -0700, Sean Christopherson wrote:
> Oh, and we'd need to make sure __decrypted_exclusive is freed when
> !CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> array is used only if SEV is active.  This patch unconditionally
> defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> !mem_encrypt_active().

We should not go nuts and complicate the code only to save us a couple
of KBs.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 07:18:25AM -0700, Sean Christopherson wrote:
> Oh, and we'd need to make sure __decrypted_exclusive is freed when
> !CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> array is used only if SEV is active.  This patch unconditionally
> defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> !mem_encrypt_active().

We should not go nuts and complicate the code only to save us a couple
of KBs.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 06:50:41AM -0700, Sean Christopherson wrote:
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark

I still don't understand why can't there be only a single __decrypted
section and to free that whole section on !SEV.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 06:50:41AM -0700, Sean Christopherson wrote:
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark

I still don't understand why can't there be only a single __decrypted
section and to free that whole section on !SEV.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 06:50:41AM -0700, Sean Christopherson wrote:
> On Thu, Sep 06, 2018 at 02:24:23PM +0200, Borislav Petkov wrote:
> > On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> > > Currently, the per-cpu pvclock data is allocated dynamically when
> > > cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> > > shared between the guest and the hypervisor hence it must be mapped as
> > > unencrypted (ie. C=0) when SEV is active.
> > > 
> > > When SEV is active, we will be wasting fairly sizeable amount of memory
> > > since each CPU will be doing a separate 4k allocation so that it can clear
> > > C-bit. Let's define few extra static page sized array of pvclock data.
> > > In the preparatory stage of CPU hotplug, use the element of this static
> > > array to avoid the dynamic allocation. This array will be put in
> > > the .data..decrypted section so that its mapped with C=0 during the boot.
> > > 
> > > In non-SEV case, this static page will unused and free'd by the
> > > free_decrypted_mem().
> > > 
> > > diff --git a/arch/x86/include/asm/mem_encrypt.h 
> > > b/arch/x86/include/asm/mem_encrypt.h
> > > index 802b2eb..aa204af 100644
> > > --- a/arch/x86/include/asm/mem_encrypt.h
> > > +++ b/arch/x86/include/asm/mem_encrypt.h
> > > @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> > > vaddr, unsigned long size);
> > >  
> > >  /* Architecture __weak replacement functions */
> > >  void __init mem_encrypt_init(void);
> > > +void __init free_decrypted_mem(void);
> > >  
> > >  bool sme_active(void);
> > >  bool sev_active(void);
> > >  
> > >  #define __decrypted __attribute__((__section__(".data..decrypted")))
> > > +#define __decrypted_hvclock 
> > > __attribute__((__section__(".data..decrypted_hvclock")))
> > 
> > So are we going to be defining a decrypted section for every piece of
> > machinery now?
> > 
> > That's a bit too much in my book.
> > 
> > Why can't you simply free everything in .data..decrypted on !SVE guests?
> 
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark
> data that is used if and only if SEV is active, and then free the
> SEV-only data when SEV is disabled.

Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 06:50:41AM -0700, Sean Christopherson wrote:
> On Thu, Sep 06, 2018 at 02:24:23PM +0200, Borislav Petkov wrote:
> > On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> > > Currently, the per-cpu pvclock data is allocated dynamically when
> > > cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> > > shared between the guest and the hypervisor hence it must be mapped as
> > > unencrypted (ie. C=0) when SEV is active.
> > > 
> > > When SEV is active, we will be wasting fairly sizeable amount of memory
> > > since each CPU will be doing a separate 4k allocation so that it can clear
> > > C-bit. Let's define few extra static page sized array of pvclock data.
> > > In the preparatory stage of CPU hotplug, use the element of this static
> > > array to avoid the dynamic allocation. This array will be put in
> > > the .data..decrypted section so that its mapped with C=0 during the boot.
> > > 
> > > In non-SEV case, this static page will unused and free'd by the
> > > free_decrypted_mem().
> > > 
> > > diff --git a/arch/x86/include/asm/mem_encrypt.h 
> > > b/arch/x86/include/asm/mem_encrypt.h
> > > index 802b2eb..aa204af 100644
> > > --- a/arch/x86/include/asm/mem_encrypt.h
> > > +++ b/arch/x86/include/asm/mem_encrypt.h
> > > @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> > > vaddr, unsigned long size);
> > >  
> > >  /* Architecture __weak replacement functions */
> > >  void __init mem_encrypt_init(void);
> > > +void __init free_decrypted_mem(void);
> > >  
> > >  bool sme_active(void);
> > >  bool sev_active(void);
> > >  
> > >  #define __decrypted __attribute__((__section__(".data..decrypted")))
> > > +#define __decrypted_hvclock 
> > > __attribute__((__section__(".data..decrypted_hvclock")))
> > 
> > So are we going to be defining a decrypted section for every piece of
> > machinery now?
> > 
> > That's a bit too much in my book.
> > 
> > Why can't you simply free everything in .data..decrypted on !SVE guests?
> 
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark
> data that is used if and only if SEV is active, and then free the
> SEV-only data when SEV is disabled.

Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> Currently, the per-cpu pvclock data is allocated dynamically when
> cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> shared between the guest and the hypervisor hence it must be mapped as
> unencrypted (ie. C=0) when SEV is active.
> 
> When SEV is active, we will be wasting fairly sizeable amount of memory
> since each CPU will be doing a separate 4k allocation so that it can clear
> C-bit. Let's define few extra static page sized array of pvclock data.
> In the preparatory stage of CPU hotplug, use the element of this static
> array to avoid the dynamic allocation. This array will be put in
> the .data..decrypted section so that its mapped with C=0 during the boot.
> 
> In non-SEV case, this static page will unused and free'd by the
> free_decrypted_mem().
> 
> Signed-off-by: Brijesh Singh 
> Suggested-by: Sean Christopherson 
> Cc: Tom Lendacky 
> Cc: k...@vger.kernel.org
> Cc: Thomas Gleixner 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Cc: k...@vger.kernel.org
> Cc: "Radim Krčmář" 
> ---
>  arch/x86/include/asm/mem_encrypt.h |  4 
>  arch/x86/kernel/kvmclock.c | 22 +++---
>  arch/x86/kernel/vmlinux.lds.S  |  3 +++
>  arch/x86/mm/init.c |  3 +++
>  arch/x86/mm/mem_encrypt.c  | 10 ++
>  5 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 802b2eb..aa204af 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size);
>  
>  /* Architecture __weak replacement functions */
>  void __init mem_encrypt_init(void);
> +void __init free_decrypted_mem(void);
>  
>  bool sme_active(void);
>  bool sev_active(void);
>  
>  #define __decrypted __attribute__((__section__(".data..decrypted")))
> +#define __decrypted_hvclock 
> __attribute__((__section__(".data..decrypted_hvclock")))
>  
>  #else/* !CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -80,6 +82,7 @@ static inline int __init
>  early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 
> 0; }
>  
>  #define __decrypted
> +#define __decrypted_hvclock
>  
>  #endif   /* CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -93,6 +96,7 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned 
> long size) { return 0;
>  #define __sme_pa_nodebug(x)  (__pa_nodebug(x) | sme_me_mask)
>  
>  extern char __start_data_decrypted[], __end_data_decrypted[];
> +extern char __start_data_decrypted_hvclock[];
>  
>  #endif   /* __ASSEMBLY__ */
>  
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 376fd3a..5b88773 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -65,6 +65,13 @@ static struct pvclock_vsyscall_time_info
>  static struct pvclock_wall_clock wall_clock __decrypted;
>  static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
>  
> +
> +/* This should cover upto 512 VCPUS (first 64 are covered by 
> hv_clock_boot[]). */
> +#define HVC_DECRYPTED_ARRAY_SIZE \
> + ((PAGE_SIZE * 7)  / sizeof(struct pvclock_vsyscall_time_info))

I think we can define the size relative to NR_CPUS rather than picking
an arbitrary number of pages, maybe with a BUILD_BUG_ON to make sure
the total size won't require a second 2mb page for __decrpyted.

#define HVC_DECRYPTED_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))

> +static struct pvclock_vsyscall_time_info
> + hv_clock_dec[HVC_DECRYPTED_ARRAY_SIZE] 
> __decrypted_hvclock;
> +
>  static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
>  {
>   return _cpu_read(hv_clock_per_cpu)->pvti;
> @@ -267,10 +274,19 @@ static int kvmclock_setup_percpu(unsigned int cpu)
>   return 0;
>  
>   /* Use the static page for the first CPUs, allocate otherwise */
> - if (cpu < HVC_BOOT_ARRAY_SIZE)
> + if (cpu < HVC_BOOT_ARRAY_SIZE) {
>   p = _clock_boot[cpu];
> - else
> - p = kzalloc(sizeof(*p), GFP_KERNEL);
> + } else {
> + /*
> +  * When SEV is active, use the static pages from
> +  * .data..decrypted_hvclock section. The pages are already
> +  * mapped with C=0.
> +  */
> + if (sev_active())
> + p = _clock_dec[cpu - HVC_BOOT_ARRAY_SIZE];
> + else
> + p = kzalloc(sizeof(*p), GFP_KERNEL);
> + }

Personal preference, but I think an if-elif-else with a single block
comment would be easier to read.

/*
 * Blah blah blah
 */
if (cpu < HVC_BOOT_ARRAY_SIZE)
p = 

Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> Currently, the per-cpu pvclock data is allocated dynamically when
> cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> shared between the guest and the hypervisor hence it must be mapped as
> unencrypted (ie. C=0) when SEV is active.
> 
> When SEV is active, we will be wasting fairly sizeable amount of memory
> since each CPU will be doing a separate 4k allocation so that it can clear
> C-bit. Let's define few extra static page sized array of pvclock data.
> In the preparatory stage of CPU hotplug, use the element of this static
> array to avoid the dynamic allocation. This array will be put in
> the .data..decrypted section so that its mapped with C=0 during the boot.
> 
> In non-SEV case, this static page will unused and free'd by the
> free_decrypted_mem().
> 
> Signed-off-by: Brijesh Singh 
> Suggested-by: Sean Christopherson 
> Cc: Tom Lendacky 
> Cc: k...@vger.kernel.org
> Cc: Thomas Gleixner 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Cc: k...@vger.kernel.org
> Cc: "Radim Krčmář" 
> ---
>  arch/x86/include/asm/mem_encrypt.h |  4 
>  arch/x86/kernel/kvmclock.c | 22 +++---
>  arch/x86/kernel/vmlinux.lds.S  |  3 +++
>  arch/x86/mm/init.c |  3 +++
>  arch/x86/mm/mem_encrypt.c  | 10 ++
>  5 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 802b2eb..aa204af 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size);
>  
>  /* Architecture __weak replacement functions */
>  void __init mem_encrypt_init(void);
> +void __init free_decrypted_mem(void);
>  
>  bool sme_active(void);
>  bool sev_active(void);
>  
>  #define __decrypted __attribute__((__section__(".data..decrypted")))
> +#define __decrypted_hvclock 
> __attribute__((__section__(".data..decrypted_hvclock")))
>  
>  #else/* !CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -80,6 +82,7 @@ static inline int __init
>  early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 
> 0; }
>  
>  #define __decrypted
> +#define __decrypted_hvclock
>  
>  #endif   /* CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -93,6 +96,7 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned 
> long size) { return 0;
>  #define __sme_pa_nodebug(x)  (__pa_nodebug(x) | sme_me_mask)
>  
>  extern char __start_data_decrypted[], __end_data_decrypted[];
> +extern char __start_data_decrypted_hvclock[];
>  
>  #endif   /* __ASSEMBLY__ */
>  
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 376fd3a..5b88773 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -65,6 +65,13 @@ static struct pvclock_vsyscall_time_info
>  static struct pvclock_wall_clock wall_clock __decrypted;
>  static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
>  
> +
> +/* This should cover upto 512 VCPUS (first 64 are covered by 
> hv_clock_boot[]). */
> +#define HVC_DECRYPTED_ARRAY_SIZE \
> + ((PAGE_SIZE * 7)  / sizeof(struct pvclock_vsyscall_time_info))

I think we can define the size relative to NR_CPUS rather than picking
an arbitrary number of pages, maybe with a BUILD_BUG_ON to make sure
the total size won't require a second 2mb page for __decrpyted.

#define HVC_DECRYPTED_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))

> +static struct pvclock_vsyscall_time_info
> + hv_clock_dec[HVC_DECRYPTED_ARRAY_SIZE] 
> __decrypted_hvclock;
> +
>  static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
>  {
>   return _cpu_read(hv_clock_per_cpu)->pvti;
> @@ -267,10 +274,19 @@ static int kvmclock_setup_percpu(unsigned int cpu)
>   return 0;
>  
>   /* Use the static page for the first CPUs, allocate otherwise */
> - if (cpu < HVC_BOOT_ARRAY_SIZE)
> + if (cpu < HVC_BOOT_ARRAY_SIZE) {
>   p = _clock_boot[cpu];
> - else
> - p = kzalloc(sizeof(*p), GFP_KERNEL);
> + } else {
> + /*
> +  * When SEV is active, use the static pages from
> +  * .data..decrypted_hvclock section. The pages are already
> +  * mapped with C=0.
> +  */
> + if (sev_active())
> + p = _clock_dec[cpu - HVC_BOOT_ARRAY_SIZE];
> + else
> + p = kzalloc(sizeof(*p), GFP_KERNEL);
> + }

Personal preference, but I think an if-elif-else with a single block
comment would be easier to read.

/*
 * Blah blah blah
 */
if (cpu < HVC_BOOT_ARRAY_SIZE)
p = 

Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 02:24:23PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> > Currently, the per-cpu pvclock data is allocated dynamically when
> > cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> > shared between the guest and the hypervisor hence it must be mapped as
> > unencrypted (ie. C=0) when SEV is active.
> > 
> > When SEV is active, we will be wasting fairly sizeable amount of memory
> > since each CPU will be doing a separate 4k allocation so that it can clear
> > C-bit. Let's define few extra static page sized array of pvclock data.
> > In the preparatory stage of CPU hotplug, use the element of this static
> > array to avoid the dynamic allocation. This array will be put in
> > the .data..decrypted section so that its mapped with C=0 during the boot.
> > 
> > In non-SEV case, this static page will unused and free'd by the
> > free_decrypted_mem().
> > 
> > Signed-off-by: Brijesh Singh 
> > Suggested-by: Sean Christopherson 
> > Cc: Tom Lendacky 
> > Cc: k...@vger.kernel.org
> > Cc: Thomas Gleixner 
> > Cc: Borislav Petkov 
> > Cc: "H. Peter Anvin" 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Paolo Bonzini 
> > Cc: Sean Christopherson 
> > Cc: k...@vger.kernel.org
> > Cc: "Radim Krčmář" 
> > ---
> >  arch/x86/include/asm/mem_encrypt.h |  4 
> >  arch/x86/kernel/kvmclock.c | 22 +++---
> >  arch/x86/kernel/vmlinux.lds.S  |  3 +++
> >  arch/x86/mm/init.c |  3 +++
> >  arch/x86/mm/mem_encrypt.c  | 10 ++
> >  5 files changed, 39 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/mem_encrypt.h 
> > b/arch/x86/include/asm/mem_encrypt.h
> > index 802b2eb..aa204af 100644
> > --- a/arch/x86/include/asm/mem_encrypt.h
> > +++ b/arch/x86/include/asm/mem_encrypt.h
> > @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> > vaddr, unsigned long size);
> >  
> >  /* Architecture __weak replacement functions */
> >  void __init mem_encrypt_init(void);
> > +void __init free_decrypted_mem(void);
> >  
> >  bool sme_active(void);
> >  bool sev_active(void);
> >  
> >  #define __decrypted __attribute__((__section__(".data..decrypted")))
> > +#define __decrypted_hvclock 
> > __attribute__((__section__(".data..decrypted_hvclock")))
> 
> So are we going to be defining a decrypted section for every piece of
> machinery now?
> 
> That's a bit too much in my book.
> 
> Why can't you simply free everything in .data..decrypted on !SVE guests?

That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.

Originally, my thought was that this would be a one-off case and the
array could be freed directly in kvmclock_init(), e.g.:

static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);

...

void __init kvmclock_init(void)
{
u8 flags;

if (!sev_active())
free_init_pages("unused decrypted",
(unsigned long)hv_clock_aux,
(unsigned long)hv_clock_aux + sizeof(hv_clock_aux));

> 
> -- 
> Regards/Gruss,
> Boris.
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> -- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 02:24:23PM +0200, Borislav Petkov wrote:
> On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> > Currently, the per-cpu pvclock data is allocated dynamically when
> > cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> > shared between the guest and the hypervisor hence it must be mapped as
> > unencrypted (ie. C=0) when SEV is active.
> > 
> > When SEV is active, we will be wasting fairly sizeable amount of memory
> > since each CPU will be doing a separate 4k allocation so that it can clear
> > C-bit. Let's define few extra static page sized array of pvclock data.
> > In the preparatory stage of CPU hotplug, use the element of this static
> > array to avoid the dynamic allocation. This array will be put in
> > the .data..decrypted section so that its mapped with C=0 during the boot.
> > 
> > In non-SEV case, this static page will unused and free'd by the
> > free_decrypted_mem().
> > 
> > Signed-off-by: Brijesh Singh 
> > Suggested-by: Sean Christopherson 
> > Cc: Tom Lendacky 
> > Cc: k...@vger.kernel.org
> > Cc: Thomas Gleixner 
> > Cc: Borislav Petkov 
> > Cc: "H. Peter Anvin" 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Paolo Bonzini 
> > Cc: Sean Christopherson 
> > Cc: k...@vger.kernel.org
> > Cc: "Radim Krčmář" 
> > ---
> >  arch/x86/include/asm/mem_encrypt.h |  4 
> >  arch/x86/kernel/kvmclock.c | 22 +++---
> >  arch/x86/kernel/vmlinux.lds.S  |  3 +++
> >  arch/x86/mm/init.c |  3 +++
> >  arch/x86/mm/mem_encrypt.c  | 10 ++
> >  5 files changed, 39 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/mem_encrypt.h 
> > b/arch/x86/include/asm/mem_encrypt.h
> > index 802b2eb..aa204af 100644
> > --- a/arch/x86/include/asm/mem_encrypt.h
> > +++ b/arch/x86/include/asm/mem_encrypt.h
> > @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> > vaddr, unsigned long size);
> >  
> >  /* Architecture __weak replacement functions */
> >  void __init mem_encrypt_init(void);
> > +void __init free_decrypted_mem(void);
> >  
> >  bool sme_active(void);
> >  bool sev_active(void);
> >  
> >  #define __decrypted __attribute__((__section__(".data..decrypted")))
> > +#define __decrypted_hvclock 
> > __attribute__((__section__(".data..decrypted_hvclock")))
> 
> So are we going to be defining a decrypted section for every piece of
> machinery now?
> 
> That's a bit too much in my book.
> 
> Why can't you simply free everything in .data..decrypted on !SVE guests?

That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.

Originally, my thought was that this would be a one-off case and the
array could be freed directly in kvmclock_init(), e.g.:

static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);

...

void __init kvmclock_init(void)
{
u8 flags;

if (!sev_active())
free_init_pages("unused decrypted",
(unsigned long)hv_clock_aux,
(unsigned long)hv_clock_aux + sizeof(hv_clock_aux));

> 
> -- 
> Regards/Gruss,
> Boris.
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> -- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> Currently, the per-cpu pvclock data is allocated dynamically when
> cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> shared between the guest and the hypervisor hence it must be mapped as
> unencrypted (ie. C=0) when SEV is active.
> 
> When SEV is active, we will be wasting fairly sizeable amount of memory
> since each CPU will be doing a separate 4k allocation so that it can clear
> C-bit. Let's define few extra static page sized array of pvclock data.
> In the preparatory stage of CPU hotplug, use the element of this static
> array to avoid the dynamic allocation. This array will be put in
> the .data..decrypted section so that its mapped with C=0 during the boot.
> 
> In non-SEV case, this static page will unused and free'd by the
> free_decrypted_mem().
> 
> Signed-off-by: Brijesh Singh 
> Suggested-by: Sean Christopherson 
> Cc: Tom Lendacky 
> Cc: k...@vger.kernel.org
> Cc: Thomas Gleixner 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Cc: k...@vger.kernel.org
> Cc: "Radim Krčmář" 
> ---
>  arch/x86/include/asm/mem_encrypt.h |  4 
>  arch/x86/kernel/kvmclock.c | 22 +++---
>  arch/x86/kernel/vmlinux.lds.S  |  3 +++
>  arch/x86/mm/init.c |  3 +++
>  arch/x86/mm/mem_encrypt.c  | 10 ++
>  5 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 802b2eb..aa204af 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size);
>  
>  /* Architecture __weak replacement functions */
>  void __init mem_encrypt_init(void);
> +void __init free_decrypted_mem(void);
>  
>  bool sme_active(void);
>  bool sev_active(void);
>  
>  #define __decrypted __attribute__((__section__(".data..decrypted")))
> +#define __decrypted_hvclock 
> __attribute__((__section__(".data..decrypted_hvclock")))

So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Borislav Petkov
On Thu, Sep 06, 2018 at 06:43:02AM -0500, Brijesh Singh wrote:
> Currently, the per-cpu pvclock data is allocated dynamically when
> cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> shared between the guest and the hypervisor hence it must be mapped as
> unencrypted (ie. C=0) when SEV is active.
> 
> When SEV is active, we will be wasting fairly sizeable amount of memory
> since each CPU will be doing a separate 4k allocation so that it can clear
> C-bit. Let's define few extra static page sized array of pvclock data.
> In the preparatory stage of CPU hotplug, use the element of this static
> array to avoid the dynamic allocation. This array will be put in
> the .data..decrypted section so that its mapped with C=0 during the boot.
> 
> In non-SEV case, this static page will unused and free'd by the
> free_decrypted_mem().
> 
> Signed-off-by: Brijesh Singh 
> Suggested-by: Sean Christopherson 
> Cc: Tom Lendacky 
> Cc: k...@vger.kernel.org
> Cc: Thomas Gleixner 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Cc: k...@vger.kernel.org
> Cc: "Radim Krčmář" 
> ---
>  arch/x86/include/asm/mem_encrypt.h |  4 
>  arch/x86/kernel/kvmclock.c | 22 +++---
>  arch/x86/kernel/vmlinux.lds.S  |  3 +++
>  arch/x86/mm/init.c |  3 +++
>  arch/x86/mm/mem_encrypt.c  | 10 ++
>  5 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 802b2eb..aa204af 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size);
>  
>  /* Architecture __weak replacement functions */
>  void __init mem_encrypt_init(void);
> +void __init free_decrypted_mem(void);
>  
>  bool sme_active(void);
>  bool sev_active(void);
>  
>  #define __decrypted __attribute__((__section__(".data..decrypted")))
> +#define __decrypted_hvclock 
> __attribute__((__section__(".data..decrypted_hvclock")))

So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--