Re: [PATCH v2 00/24] x86/resctrl: Merge the CDP resources

2021-04-06 Thread Babu Moger



On 4/6/21 12:19 PM, James Morse wrote:
> Hi Babu,
> 
> On 30/03/2021 21:36, Babu Moger wrote:
>> On 3/12/21 11:58 AM, James Morse wrote:
>>> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
>>> behaviour is all contained in the filesystem parts, with a minimum amount
>>> of arch specific code.
>>>
>>> Arm have some CPU support for dividing caches into portions, and
>>> applying bandwidth limits at various points in the SoC. The collective term
>>> for these features is MPAM: Memory Partitioning and Monitoring.
>>>
>>> MPAM is similar enough to Intel RDT, that it should use the defacto linux
>>> interface: resctrl. This filesystem currently lives under arch/x86, and is
>>> tightly coupled to the architecture.
>>> Ultimately, my plan is to split the existing resctrl code up to have an
>>> arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
>>> MPAM can be wired up.
>>>
>>> x86 might have two resources with cache controls, (L2 and L3) but has
>>> extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
>>> if CDP is enabled for the corresponding cache.
>>>
>>> MPAM has an equivalent feature to CDP, but its a property of the CPU,
>>> not the cache. Resctrl needs to have x86's odd/even behaviour, as that
>>> its the ABI, but this isn't how the MPAM hardware works. It is entirely
>>> possible that an in-kernel user of MPAM would not be using CDP, whereas
>>> resctrl is.
>>> Pretending L3CODE and L3DATA are entirely separate resources is a neat
>>> trick, but doing this is specific to x86.
>>> Doing this leaves the arch code in control of various parts of the
>>> filesystem ABI: the resources names, and the way the schemata are parsed.
>>> Allowing this stuff to vary between architectures is bad for user space.
>>>
>>> This series collapses the CODE/DATA resources, moving all the user-visible
>>> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
>>> configuration being applied to a cache. This is done by adding a
>>> struct resctrl_schema to the parts of resctrl that will move to fs. This
>>> holds the arch-code resource that is in use for this schema, along with
>>> other properties like the name, and whether the configuration being applied
>>> is CODE/DATA/BOTH.
> 
> 
>> I applied your patches on my AMD box.
> 
> Great! Thanks for taking a look,
> 
> 
>> Seeing some difference in the behavior.
> 
> Ooer,
> 
> 
>> Before these patches.
>>
>> # dmesg |grep -i resctrl
>> [   13.076973] resctrl: L3 allocation detected
>> [   13.087835] resctrl: L3DATA allocation detected
>> [   13.092886] resctrl: L3CODE allocation detected
>> [   13.097936] resctrl: MB allocation detected
>> [   13.102599] resctrl: L3 monitoring detected
>>
>>
>> After the patches.
>>
>> # dmesg |grep -i resctrl
>> [   13.076973] resctrl: L3 allocation detected
>> [   13.097936] resctrl: MB allocation detected
>> [   13.102599] resctrl: L3 monitoring detected
>>
>> You can see that L3DATA and L3CODE disappeared. I think we should keep the
>> behavior same for x86(at least).
> 
> This is the kernel log ... what user-space software is parsing that for an 
> expected value?
> What happens if the resctrl strings have been overwritten by more kernel log?
> 
> I don't think user-space should be relying on this. I'd argue any user-space 
> doing this is
> already broken. Is it just the kernel selftest's filter_dmesg()? It doesn't 
> seem to do
> anything useful
> 
> Whether resctrl is support can be read from /proc/filesystems. CDP is 
> probably a
> try-it-and-see. User-space could parse /proc/cpuinfo, but its probably not a 
> good idea.

Yes. Agree. Looking at the dmesg may no be right way to figure out all the
support details. As a normal practice, I searched for these texts and
noticed difference. That is why I felt it is best to keep those texts same
as before.
> 
> 
> Its easy to fix, but it seems odd that the kernel has to print things for 
> user-space to
> try and parse. (I'd like to point at the user-space software that depends on 
> this)

I dont think there is any software that parses the dmesg for these
details. These are info messages for the developers.

> 
> 
>> I am still not clear why we needed resctrl_conf_type
>>
>> enum resctrl_conf_type {
>> CDP_BOTH,
>> CDP_CODE,
>> CDP_DATA,
>> };
>>
>> Right

Re: [PATCH v2 00/24] x86/resctrl: Merge the CDP resources

2021-03-30 Thread Babu Moger
Hi James,
Thanks for the patches. Few comments below.

On 3/12/21 11:58 AM, James Morse wrote:
> Hi folks,
> 
> Thanks to Reinette and Jamie for the comments on v1. Major changes in v2 are
> to keep the closid in resctrl_arch_update_domains(), eliminating one patch,
> splitting another that was making two sorts of change, and to re-order the
> first few patches. See each patches changelog for more.
> 
> 
> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
> behaviour is all contained in the filesystem parts, with a minimum amount
> of arch specific code.
> 
> Arm have some CPU support for dividing caches into portions, and
> applying bandwidth limits at various points in the SoC. The collective term
> for these features is MPAM: Memory Partitioning and Monitoring.
> 
> MPAM is similar enough to Intel RDT, that it should use the defacto linux
> interface: resctrl. This filesystem currently lives under arch/x86, and is
> tightly coupled to the architecture.
> Ultimately, my plan is to split the existing resctrl code up to have an
> arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
> MPAM can be wired up.
> 
> x86 might have two resources with cache controls, (L2 and L3) but has
> extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
> if CDP is enabled for the corresponding cache.
> 
> MPAM has an equivalent feature to CDP, but its a property of the CPU,
> not the cache. Resctrl needs to have x86's odd/even behaviour, as that
> its the ABI, but this isn't how the MPAM hardware works. It is entirely
> possible that an in-kernel user of MPAM would not be using CDP, whereas
> resctrl is.
> Pretending L3CODE and L3DATA are entirely separate resources is a neat
> trick, but doing this is specific to x86.
> Doing this leaves the arch code in control of various parts of the
> filesystem ABI: the resources names, and the way the schemata are parsed.
> Allowing this stuff to vary between architectures is bad for user space.
> 
> This series collapses the CODE/DATA resources, moving all the user-visible
> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
> configuration being applied to a cache. This is done by adding a
> struct resctrl_schema to the parts of resctrl that will move to fs. This
> holds the arch-code resource that is in use for this schema, along with
> other properties like the name, and whether the configuration being applied
> is CODE/DATA/BOTH.

I applied your patches on my AMD box. Seeing some difference in the behavior.

Before these patches.

# dmesg |grep -i resctrl
[   13.076973] resctrl: L3 allocation detected
[   13.087835] resctrl: L3DATA allocation detected
[   13.092886] resctrl: L3CODE allocation detected
[   13.097936] resctrl: MB allocation detected
[   13.102599] resctrl: L3 monitoring detected


After the patches.

# dmesg |grep -i resctrl
[   13.076973] resctrl: L3 allocation detected
[   13.097936] resctrl: MB allocation detected
[   13.102599] resctrl: L3 monitoring detected

You can see that L3DATA and L3CODE disappeared. I think we should keep the
behavior same for x86(at least).



I am still not clear why we needed resctrl_conf_type

enum resctrl_conf_type {
CDP_BOTH,
CDP_CODE,
CDP_DATA,
};

Right now, I see all the resources are initialized as CDP_BOTH.

 [RDT_RESOURCE_L3] =
{
.conf_type  = CDP_BOTH,
 [RDT_RESOURCE_L2] =
{
.conf_type  = CDP_BOTH,
 [RDT_RESOURCE_MBA] =
{
.conf_type  = CDP_BOTH,

If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
CDP_DATA? Are these going to be different for ARM?

Also initializing RDT_RESOURCE_MBA as CDP_BOTH does not seem right. I dont
think there will CDP support in MBA in future.



> 
> This lets us fold the extra resources out of the arch code so that they
> don't need to be duplicated if the equivalent feature to CDP is missing, or
> implemented in a different way.
> 
> 
> The first two patches split the resource and domain structs to have an
> arch specific 'hw' portion, and the rest that is visible to resctrl.
> Future series massage the resctrl code so there are no accesses to 'hw'
> structures in the parts of resctrl that will move to fs, providing helpers
> where necessary.
> 
> This series adds temporary scaffolding, which it removes a few patches
> later. This is to allow things like the ctrlval arrays and resources to be
> merged separately, which should make is easier to bisect. These things
> are marked temporary, and should all be gone by the end of the series.
> 
> This series is a little rough around the monitors, would a fake
> struct resctrl_schema for the monitors simplify things, or be a source
> of bugs?
> 
> This series is based on v5.12-rc2, and can be retrieved from:
> git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git 
> 

Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled

2021-03-25 Thread Babu Moger



On 3/25/21 5:29 AM, Borislav Petkov wrote:
> Ok,
> 
> I tried to be as specific as possible in the commit message so that we
> don't forget. Please lemme know if I've missed something.
> 
> Babu, Jim, I'd appreciate it if you ran this to confirm.
> 
> Thx.
> 
> ---
> From: Borislav Petkov 
> Date: Thu, 25 Mar 2021 11:02:31 +0100
> 
> Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
> are exploding during alternatives patching:
> 
>   kernel BUG at 
> /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
>   invalid opcode:  [#1] SMP
>   Modules linked in:
>   CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
>   Call Trace:
>swap_entry_free
>swap_entry_free
>text_poke_bp
>swap_entry_free
>arch_jump_label_transform
>set_debug_rodata
>__jump_label_update
>static_key_slow_inc
>frontswap_register_ops
>init_zswap
>init_frontswap
>do_one_initcall
>set_debug_rodata
>kernel_init_freeable
>rest_init
>kernel_init
>ret_from_fork
> 
> triggering the BUG_ON in text_poke() which verifies whether patched
> instruction bytes have actually landed at the destination.
> 
> Further debugging showed that the TLB flush before that check is
> insufficient because there could be global mappings left in the TLB,
> leading to a stale mapping getting used.
> 
> I say "global mappings" because the hardware configuration is a new one:
> machine is an AMD, which means, KAISER/PTI doesn't need to be enabled
> there, which also means there's no user/kernel pagetables split and
> therefore the TLB can have global mappings.
> 
> And the configuration is new one for a second reason: because that AMD
> machine supports PCID and INVPCID, which leads the CPU detection code to
> set the synthetic X86_FEATURE_INVPCID_SINGLE flag.
> 
> Now, __native_flush_tlb_single() does invalidate global mappings when
> X86_FEATURE_INVPCID_SINGLE is *not* set and returns.
> 
> When X86_FEATURE_INVPCID_SINGLE is set, however, it invalidates the
> requested address from both PCIDs in the KAISER-enabled case. But if
> KAISER is not enabled and the machine has global mappings in the TLB,
> then those global mappings do not get invalidated, which would lead to
> the above mismatch from using a stale TLB entry.
> 
> So make sure to flush those global mappings in the KAISER disabled case.
> 
> Co-debugged by Babu Moger .
> 
> Reported-by: Jim Mattson 
> Signed-off-by: Borislav Petkov 
> Link: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2FCALMp9eRDSW66%252BXvbHVF4ohL7XhThoPoT0BrB0TcS0cgk%3DdkcBg%40mail.gmail.comdata=04%7C01%7Cbabu.moger%40amd.com%7Cf4e0aacf81744dc8be4408d8ef78f2cf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637522650066097649%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=1c4MQ9I9KrLxWLqghGCI%2BC%2Bvs0c9vYaNC5d%2FiYL0oMA%3Dreserved=0
> ---
>  arch/x86/include/asm/tlbflush.h | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..2bfa4deb8cae 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned 
> long addr)
>* ASID.  But, userspace flushes are probably much more
>* important performance-wise.
>*
> -  * Make sure to do only a single invpcid when KAISER is
> -  * disabled and we have only a single ASID.
> +  * In the KAISER disabled case, do an INVLPG to make sure
> +  * the mapping is flushed in case it is a global one.
>*/
> - if (kaiser_enabled)
> + if (kaiser_enabled) {
>   invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> - invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> + invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> +     } else {
> + asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> + }
>  }
>  
>  static inline void __flush_tlb_all(void)
> 

Thanks Boris. As you updated the patch little bit since yesterday, I
retested them again both on host and guest kernel. They are looking good.

Tested-by: Babu Moger 


RE: [PATCH v5 08/21] selftests/resctrl: Call kselftest APIs to log test results

2021-03-12 Thread Babu Moger



> -Original Message-
> From: Fenghua Yu 
> Sent: Sunday, March 7, 2021 8:55 AM
> To: Shuah Khan ; Tony Luck ;
> Reinette Chatre ; Moger, Babu
> 
> Cc: linux-kselftest ; linux-kernel  ker...@vger.kernel.org>; Fenghua Yu 
> Subject: [PATCH v5 08/21] selftests/resctrl: Call kselftest APIs to log test 
> results
> 
> Call kselftest APIs instead of using printf() to log test results
> for cleaner code and better future extension.
> 
> Suggested-by: Shuah Khan 
> Signed-off-by: Fenghua Yu 
> ---
> Change Log:
> v5:
> - Add this patch (Shuah)
> 
>  tools/testing/selftests/resctrl/cat_test.c| 37 +++
>  tools/testing/selftests/resctrl/cmt_test.c| 42 -
>  tools/testing/selftests/resctrl/mba_test.c| 24 +-
>  tools/testing/selftests/resctrl/mbm_test.c| 28 ++--
>  tools/testing/selftests/resctrl/resctrl.h |  2 +-
>  .../testing/selftests/resctrl/resctrl_tests.c | 40 +
>  tools/testing/selftests/resctrl/resctrl_val.c |  4 +-
>  tools/testing/selftests/resctrl/resctrlfs.c   | 45 +++
>  8 files changed, 105 insertions(+), 117 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cat_test.c
> b/tools/testing/selftests/resctrl/cat_test.c
> index 20823725daca..0deb38ed971b 100644
> --- a/tools/testing/selftests/resctrl/cat_test.c
> +++ b/tools/testing/selftests/resctrl/cat_test.c
> @@ -52,25 +52,28 @@ static int cat_setup(int num, ...)
>   return ret;
>  }
> 
> -static void show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
> - unsigned long span)
> +static int show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
> +unsigned long span)
>  {
>   unsigned long allocated_cache_lines = span / 64;
>   unsigned long avg_llc_perf_miss = 0;
>   float diff_percent;
> + int ret;
> 
>   avg_llc_perf_miss = sum_llc_perf_miss / (NUM_OF_RUNS - 1);
>   diff_percent = ((float)allocated_cache_lines - avg_llc_perf_miss) /
>   allocated_cache_lines * 100;
> 
> - printf("%sok CAT: cache miss rate within %d%%\n",
> -!is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT ?
> -"not " : "", MAX_DIFF_PERCENT);
> - tests_run++;
> - printf("# Percent diff=%d\n", abs((int)diff_percent));
> - printf("# Number of bits: %d\n", no_of_bits);
> - printf("# Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);
> - printf("# Allocated cache lines: %lu\n", allocated_cache_lines);
> + ret = !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT;
> + ksft_print_msg("cache miss rate %swithin %d%%\n",
> +ret ? "not " : "", MAX_DIFF_PERCENT);
> +
> + ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
> + ksft_print_msg("Number of bits: %d\n", no_of_bits);
> + ksft_print_msg("Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);
> + ksft_print_msg("Allocated cache lines: %lu\n", allocated_cache_lines);
> +
> + return ret;
>  }
> 
>  static int check_results(struct resctrl_val_param *param)
> @@ -80,7 +83,7 @@ static int check_results(struct resctrl_val_param *param)
>   int runs = 0, no_of_bits = 0;
>   FILE *fp;
> 
> - printf("# Checking for pass/fail\n");
> + ksft_print_msg("Checking for pass/fail\n");
>   fp = fopen(param->filename, "r");
>   if (!fp) {
>   perror("# Cannot open file");
> @@ -108,9 +111,7 @@ static int check_results(struct resctrl_val_param
> *param)
>   fclose(fp);
>   no_of_bits = count_bits(param->mask);
> 
> - show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);
> -
> - return 0;
> + return show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);
>  }
> 
>  void cat_test_cleanup(void)
> @@ -146,15 +147,15 @@ int cat_perf_miss_val(int cpu_no, int n, char
> *cache_type)
>   ret = get_cache_size(cpu_no, cache_type, _size);
>   if (ret)
>   return ret;
> - printf("cache size :%lu\n", cache_size);
> + ksft_print_msg("cache size :%lu\n", cache_size);
> 
>   /* Get max number of bits from default-cabm mask */
>   count_of_bits = count_bits(long_mask);
> 
>   if (n < 1 || n > count_of_bits - 1) {
> - printf("Invalid input value for no_of_bits n!\n");
> - printf("Please Enter value in range 1 to %d\n",
> -count_of_bits - 1);
> + ksft_print_msg("Invalid input value for no_of_bits n!\n");
> + ksft_print_msg("Please Enter value in range 1 to %d\n",
> +count_of_bits - 1);
>   return -1;
>   }
> 
> diff --git a/tools/testing/selftests/resctrl/cmt_test.c
> b/tools/testing/selftests/resctrl/cmt_test.c
> index ca82db37c1f7..e5af19335115 100644
> --- a/tools/testing/selftests/resctrl/cmt_test.c
> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> @@ -39,36 +39,33 @@ static int cmt_setup(int num, ...)
>   return 

RE: [PATCH v5 07/21] selftests/resctrl: Rename CQM test as CMT test

2021-03-12 Thread Babu Moger



> -Original Message-
> From: Fenghua Yu 
> Sent: Sunday, March 7, 2021 8:55 AM
> To: Shuah Khan ; Tony Luck ;
> Reinette Chatre ; Moger, Babu
> 
> Cc: linux-kselftest ; linux-kernel  ker...@vger.kernel.org>; Fenghua Yu 
> Subject: [PATCH v5 07/21] selftests/resctrl: Rename CQM test as CMT test
> 
> CMT (Cache Monitoring Technology) [1] is a H/W feature that reports cache
> occupancy of a process. resctrl selftest suite has a unit test to test CMT
> for LLC but the test is named as CQM (Cache Quality Monitoring).
> Furthermore, the unit test source file is named as cqm_test.c and several
> functions, variables, comments, preprocessors and statements widely use
> "cqm" as either suffix or prefix. This rampant misusage of CQM for CMT
> might confuse someone who is newly looking at resctrl selftests because
> this feature is named CMT in the Intel Software Developer's Manual.
> 
> Hence, rename all the occurrences (unit test source file name, functions,
> variables, comments and preprocessors) of cqm with cmt.
> 
> [1] Please see Intel SDM, Volume 3, chapter 17 and section 18 for more
> information on CMT:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftwar
> e.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdevelop%2Farticles%2Fintel-
> sdm.htmldata=04%7C01%7Cbabu.moger%40amd.com%7C15bb0710f06a
> 4a413ee008d8e1790293%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0
> %7C637507257137108219%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata
> =f19HKVi8hIoN4CbNYheS4bV136HCbE3INJEum4wDJqo%3Dreserved=0
> 
> Suggested-by: Reinette Chatre 
> Signed-off-by: Fenghua Yu 
> ---
>  tools/testing/selftests/resctrl/README|  4 +--
>  tools/testing/selftests/resctrl/cache.c   |  4 +--
>  .../resctrl/{cqm_test.c => cmt_test.c}| 20 +++---
>  tools/testing/selftests/resctrl/resctrl.h |  6 ++---
>  .../testing/selftests/resctrl/resctrl_tests.c | 26 +--
>  tools/testing/selftests/resctrl/resctrl_val.c | 12 -
>  tools/testing/selftests/resctrl/resctrlfs.c   | 10 +++
>  7 files changed, 41 insertions(+), 41 deletions(-)
>  rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (89%)
> 
> diff --git a/tools/testing/selftests/resctrl/README
> b/tools/testing/selftests/resctrl/README
> index 6e5a0ffa18e8..4b36b25b6ac0 100644
> --- a/tools/testing/selftests/resctrl/README
> +++ b/tools/testing/selftests/resctrl/README
> @@ -46,8 +46,8 @@ ARGUMENTS
>  Parameter '-h' shows usage information.
> 
>  usage: resctrl_tests [-h] [-b "benchmark_cmd [options]"] [-t test list] [-n
> no_of_bits]
> --b benchmark_cmd [options]: run specified benchmark for MBM, MBA and
> CQM default benchmark is builtin fill_buf
> --t test list: run tests specified in the test list, e.g. -t mbm, 
> mba, cqm, cat
> +-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and
> CMT default benchmark is builtin fill_buf
> +-t test list: run tests specified in the test list, e.g. -t mbm, 
> mba, cmt, cat
>  -n no_of_bits: run cache tests using specified no of bits in cache 
> bit mask
>  -p cpu_no: specify CPU number to run the test. 1 is default
>  -h: help
> diff --git a/tools/testing/selftests/resctrl/cache.c
> b/tools/testing/selftests/resctrl/cache.c
> index 5922cc1b0386..2aa1b5c7d9e1 100644
> --- a/tools/testing/selftests/resctrl/cache.c
> +++ b/tools/testing/selftests/resctrl/cache.c
> @@ -111,7 +111,7 @@ static int get_llc_perf(unsigned long *llc_perf_miss)
> 
>  /*
>   * Get LLC Occupancy as reported by RESCTRL FS
> - * For CQM,
> + * For CMT,
>   * 1. If con_mon grp and mon grp given, then read from mon grp in
>   * con_mon grp
>   * 2. If only con_mon grp given, then read from con_mon grp
> @@ -192,7 +192,7 @@ int measure_cache_vals(struct resctrl_val_param
> *param, int bm_pid)
>   /*
>* Measure llc occupancy from resctrl.
>*/
> - if (!strncmp(param->resctrl_val, CQM_STR, sizeof(CQM_STR))) {
> + if (!strncmp(param->resctrl_val, CMT_STR, sizeof(CMT_STR))) {
>   ret = get_llc_occu_resctrl(_occu_resc);
>   if (ret < 0)
>   return ret;
> diff --git a/tools/testing/selftests/resctrl/cqm_test.c
> b/tools/testing/selftests/resctrl/cmt_test.c
> similarity index 89%
> rename from tools/testing/selftests/resctrl/cqm_test.c
> rename to tools/testing/selftests/resctrl/cmt_test.c
> index 271752e9ef5b..ca82db37c1f7 100644
> --- a/tools/testing/selftests/resctrl/cqm_test.c
> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
> - * Cache Monitoring Technology (CQM) test
> + * Cache Monitoring Technology (CMT) test
>   *
>   * Copyright (C) 2018 Intel Corporation
>   *
> @@ -11,7 +11,7 @@
>  #include "resctrl.h"
>  #include 
> 
> -#define RESULT_FILE_NAME "result_cqm"
> +#define RESULT_FILE_NAME "result_cmt"
>  #define 

RE: [PATCH v5 04/21] selftests/resctrl: Clean up resctrl features check

2021-03-12 Thread Babu Moger



> -Original Message-
> From: Fenghua Yu 
> Sent: Sunday, March 7, 2021 8:55 AM
> To: Shuah Khan ; Tony Luck ;
> Reinette Chatre ; Moger, Babu
> 
> Cc: linux-kselftest ; linux-kernel  ker...@vger.kernel.org>; Fenghua Yu 
> Subject: [PATCH v5 04/21] selftests/resctrl: Clean up resctrl features check
> 
> Checking resctrl features call strcmp() to compare feature strings (e.g. 
> "mba",
> "cat" etc). The checkings are error prone and don't have good coding style.
> Define the constant strings in macros and call
> strncmp() to solve the potential issues.
> 
> Suggested-by: Shuah Khan 
> Signed-off-by: Fenghua Yu 
> ---
> Change Log:
> v5:
> - Remove is_cat() etc functions and directly call strncmp() to check
>   the features (Shuah).
> 
>  tools/testing/selftests/resctrl/cache.c   |  8 +++
>  tools/testing/selftests/resctrl/cat_test.c|  2 +-
>  tools/testing/selftests/resctrl/cqm_test.c|  2 +-
>  tools/testing/selftests/resctrl/fill_buf.c|  4 ++--
>  tools/testing/selftests/resctrl/mba_test.c|  2 +-
>  tools/testing/selftests/resctrl/mbm_test.c|  2 +-
>  tools/testing/selftests/resctrl/resctrl.h |  5 +
>  .../testing/selftests/resctrl/resctrl_tests.c | 12 +-
> tools/testing/selftests/resctrl/resctrl_val.c | 22 +--
>  tools/testing/selftests/resctrl/resctrlfs.c   | 17 +++---
>  10 files changed, 41 insertions(+), 35 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cache.c
> b/tools/testing/selftests/resctrl/cache.c
> index 38dbf4962e33..5922cc1b0386 100644
> --- a/tools/testing/selftests/resctrl/cache.c
> +++ b/tools/testing/selftests/resctrl/cache.c
> @@ -182,7 +182,7 @@ int measure_cache_vals(struct resctrl_val_param
> *param, int bm_pid)
>   /*
>* Measure cache miss from perf.
>*/
> - if (!strcmp(param->resctrl_val, "cat")) {
> + if (!strncmp(param->resctrl_val, CAT_STR, sizeof(CAT_STR))) {
>   ret = get_llc_perf(_perf_miss);
>   if (ret < 0)
>   return ret;
> @@ -192,7 +192,7 @@ int measure_cache_vals(struct resctrl_val_param
> *param, int bm_pid)
>   /*
>* Measure llc occupancy from resctrl.
>*/
> - if (!strcmp(param->resctrl_val, "cqm")) {
> + if (!strncmp(param->resctrl_val, CQM_STR, sizeof(CQM_STR))) {
>   ret = get_llc_occu_resctrl(_occu_resc);
>   if (ret < 0)
>   return ret;
> @@ -234,7 +234,7 @@ int cat_val(struct resctrl_val_param *param)
>   if (ret)
>   return ret;
> 
> - if ((strcmp(resctrl_val, "cat") == 0)) {
> + if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
>   ret = initialize_llc_perf();
>   if (ret)
>   return ret;
> @@ -242,7 +242,7 @@ int cat_val(struct resctrl_val_param *param)
> 
>   /* Test runs until the callback setup() tells the test to stop. */
>   while (1) {
> - if (strcmp(resctrl_val, "cat") == 0) {
> + if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
>   ret = param->setup(1, param);
>   if (ret) {
>   ret = 0;
> diff --git a/tools/testing/selftests/resctrl/cat_test.c
> b/tools/testing/selftests/resctrl/cat_test.c
> index bdeeb5772592..20823725daca 100644
> --- a/tools/testing/selftests/resctrl/cat_test.c
> +++ b/tools/testing/selftests/resctrl/cat_test.c
> @@ -164,7 +164,7 @@ int cat_perf_miss_val(int cpu_no, int n, char
> *cache_type)
>   return -1;
> 
>   struct resctrl_val_param param = {
> - .resctrl_val= "cat",
> + .resctrl_val= CAT_STR,
>   .cpu_no = cpu_no,
>   .mum_resctrlfs  = 0,
>   .setup  = cat_setup,
> diff --git a/tools/testing/selftests/resctrl/cqm_test.c
> b/tools/testing/selftests/resctrl/cqm_test.c
> index de33d1c0466e..271752e9ef5b 100644
> --- a/tools/testing/selftests/resctrl/cqm_test.c
> +++ b/tools/testing/selftests/resctrl/cqm_test.c
> @@ -145,7 +145,7 @@ int cqm_resctrl_val(int cpu_no, int n, char
> **benchmark_cmd)
>   }
> 
>   struct resctrl_val_param param = {
> - .resctrl_val= "cqm",
> + .resctrl_val= CQM_STR,
>   .ctrlgrp= "c1",
>   .mongrp = "m1",
>   .cpu_no = cpu_no,
> diff --git a/tools/testing/selftests/resctrl/fill_buf.c
> b/tools/testing/selftests/resctrl/fill_buf.c
> index 79c611c99a3d..51e5cf22632f 100644
> --- a/tools/testing/selftests/resctrl/fill_buf.c
> +++ b/tools/testing/selftests/resctrl/fill_buf.c
> @@ -115,7 +115,7 @@ static int fill_cache_read(unsigned char *start_ptr,
> unsigned char *end_ptr,
> 
>   while (1) {
>   ret = fill_one_span_read(start_ptr, end_ptr);
> - if (!strcmp(resctrl_val, "cat"))
> + if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR)))
>   

RE: [PATCH v5 02/21] selftests/resctrl: Fix compilation issues for global variables

2021-03-12 Thread Babu Moger



> -Original Message-
> From: Fenghua Yu 
> Sent: Sunday, March 7, 2021 8:55 AM
> To: Shuah Khan ; Tony Luck ;
> Reinette Chatre ; Moger, Babu
> 
> Cc: linux-kselftest ; linux-kernel  ker...@vger.kernel.org>; Fenghua Yu 
> Subject: [PATCH v5 02/21] selftests/resctrl: Fix compilation issues for global
> variables
> 
> Reinette reported following compilation issue on Fedora 32, gcc version
> 10.1.1
> 
> /usr/bin/ld: cqm_test.o:/cqm_test.c:22: multiple definition of
> `cache_size'; cat_test.o:/cat_test.c:23: first defined here
> 
> The same issue is reported for long_mask, cbm_mask, count_of_bits etc
> variables as well. Compiler isn't happy because these variables are
> defined globally in two .c files namely cqm_test.c and cat_test.c and
> the compiler during compilation finds that the variable is already
> defined (multiple definition error).
> 
> Taking a closer look at the usage of these variables reveals that these
> variables are used only locally to functions such as cqm_resctrl_val()

%s/ locally to functions/locally in two functions

> (defined in cqm_test.c) and cat_perf_miss_val() (defined in cat_test.c).
> These variables are not shared between those functions. So, there is no
> need for these variables to be global. Hence, fix this issue by making
> them static variables.
> 
> Reported-by: Reinette Chatre 
> Signed-off-by: Fenghua Yu 
> ---
> Change Log:
> v5:
> - Define long_mask, cbm_mask, count_of_bits etc as static variables
>   (Shuah).
> - Split this patch into patch 2 and 3 (Shuah).
> 
>  tools/testing/selftests/resctrl/cat_test.c  | 10 +-
>  tools/testing/selftests/resctrl/cqm_test.c  | 10 +-
>  tools/testing/selftests/resctrl/resctrl.h   |  2 +-
>  tools/testing/selftests/resctrl/resctrlfs.c | 10 +-
>  4 files changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cat_test.c
> b/tools/testing/selftests/resctrl/cat_test.c
> index 5da43767b973..bdeeb5772592 100644
> --- a/tools/testing/selftests/resctrl/cat_test.c
> +++ b/tools/testing/selftests/resctrl/cat_test.c
> @@ -17,10 +17,10 @@
>  #define MAX_DIFF_PERCENT 4
>  #define MAX_DIFF 100
> 
> -int count_of_bits;
> -char cbm_mask[256];
> -unsigned long long_mask;
> -unsigned long cache_size;
> +static int count_of_bits;
> +static char cbm_mask[256];
> +static unsigned long long_mask;
> +static unsigned long cache_size;
> 
>  /*
>   * Change schemata. Write schemata to specified
> @@ -136,7 +136,7 @@ int cat_perf_miss_val(int cpu_no, int n, char
> *cache_type)
>   return -1;
> 
>   /* Get default cbm mask for L3/L2 cache */
> - ret = get_cbm_mask(cache_type);
> + ret = get_cbm_mask(cache_type, cbm_mask);
>   if (ret)
>   return ret;
> 
> diff --git a/tools/testing/selftests/resctrl/cqm_test.c
> b/tools/testing/selftests/resctrl/cqm_test.c
> index 5e7308ac63be..de33d1c0466e 100644
> --- a/tools/testing/selftests/resctrl/cqm_test.c
> +++ b/tools/testing/selftests/resctrl/cqm_test.c
> @@ -16,10 +16,10 @@
>  #define MAX_DIFF 200
>  #define MAX_DIFF_PERCENT 15
> 
> -int count_of_bits;
> -char cbm_mask[256];
> -unsigned long long_mask;
> -unsigned long cache_size;
> +static int count_of_bits;
> +static char cbm_mask[256];
> +static unsigned long long_mask;
> +static unsigned long cache_size;
> 
>  static int cqm_setup(int num, ...)
>  {
> @@ -125,7 +125,7 @@ int cqm_resctrl_val(int cpu_no, int n, char
> **benchmark_cmd)
>   if (!validate_resctrl_feature_request("cqm"))
>   return -1;
> 
> - ret = get_cbm_mask("L3");
> + ret = get_cbm_mask("L3", cbm_mask);
>   if (ret)
>   return ret;
> 
> diff --git a/tools/testing/selftests/resctrl/resctrl.h
> b/tools/testing/selftests/resctrl/resctrl.h
> index 39bf59c6b9c5..959c71e39bdc 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -92,7 +92,7 @@ void tests_cleanup(void);
>  void mbm_test_cleanup(void);
>  int mba_schemata_change(int cpu_no, char *bw_report, char
> **benchmark_cmd);
>  void mba_test_cleanup(void);
> -int get_cbm_mask(char *cache_type);
> +int get_cbm_mask(char *cache_type, char *cbm_mask);
>  int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size);
>  void ctrlc_handler(int signum, siginfo_t *info, void *ptr);
>  int cat_val(struct resctrl_val_param *param);
> diff --git a/tools/testing/selftests/resctrl/resctrlfs.c
> b/tools/testing/selftests/resctrl/resctrlfs.c
> index 19c0ec4045a4..2a16100c9c3f 100644
> --- a/tools/testing/selftests/resctrl/resctrlfs.c
> +++ b/tools/testing/selftests/resctrl/resctrlfs.c
> @@ -49,8 +49,6 @@ static int find_resctrl_mount(char *buffer)
>   return -ENOENT;
>  }
> 
> -char cbm_mask[256];
> -
>  /*
>   * remount_resctrlfs - Remount resctrl FS at /sys/fs/resctrl
>   * @mum_resctrlfs:   Should the resctrl FS be remounted?
> @@ -205,16 +203,18 @@ int get_cache_size(int cpu_no, char 

RE: [PATCH v5 00/21] Miscellaneous fixes for resctrl selftests

2021-03-12 Thread Babu Moger
Hi Fenghua, Thanks for the patches.
Sanity tested them on AMD systems. Appears to work fine.
Few minor comments in few patches.
Tested-by: Babu Moger 
Thanks
Babu

> -Original Message-
> From: Fenghua Yu 
> Sent: Sunday, March 7, 2021 8:55 AM
> To: Shuah Khan ; Tony Luck ;
> Reinette Chatre ; Moger, Babu
> 
> Cc: linux-kselftest ; linux-kernel  ker...@vger.kernel.org>; Fenghua Yu 
> Subject: [PATCH v5 00/21] Miscellaneous fixes for resctrl selftests
> 
> This patch set has several miscellaneous fixes to resctrl selftest tool that 
> are
> easily visible to user. V1 had fixes to CAT test and CMT test but they were
> dropped in V2 because having them here made the patchset humongous. So,
> changes to CAT test and CMT test will be posted in another patchset.
> 
> Change Log:
> v5:
> - Address various comments from Shuah Khan:
>   1. Move a few fixing patches before cleaning patches.
>   2. Call kselftest APIs to log test results instead of printf().
>   3. Add .gitignore to ignore resctrl_tests.
>   4. Share show_cache_info() in CAT and CMT tests.
>   5. Define long_mask, cbm_mask, count_of_bits etc as static variables.
> 
> v4:
> - Address various comments from Shuah Khan:
>   1. Combine a few patches e.g. a couple of fixing typos patches into one
>  and a couple of unmounting patches into one etc.
>   2. Add config file.
>   3. Remove "Fixes" tags.
>   4. Change strcmp() to strncmp().
>   5. Move the global variable fixing patch to the patch 1 so that the
>  compilation issue is fixed first.
> 
> Please note:
> - I didn't move the patch of renaming CQM to CMT to the end of the series
>   because code and commit messages in a few other patches depend on the
>   new term of "CMT". If move the renaming patch to the end, the previous
>   patches use the old "CQM" term and code which will be changed soon at
>   the end of series and will cause more code and explanations.
> [v3:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org
> %2Flkml%2F2020%2F10%2F28%2F137data=04%7C01%7Cbabu.moger%4
> 0amd.com%7Cd188e85e961f4d246fb208d8e17901e7%7C3dd8961fe4884e608e
> 11a82d994e183d%7C0%7C0%7C637507257122356726%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M
> n0%3D%7C1000sdata=nHYN3OHv3eCZyC29T4cuiyD8SbGKoISOxsXFJHV0
> M54%3Dreserved=0]
> 
> v3:
> Address various comments (commit messages, return value on test failure, print
> failure info on test failure etc) from Reinette and Tony.
> [v2:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> nel.org%2Flinux-
> kselftest%2Fcover.1589835155.git.sai.praneeth.prakhya%40intel.com%2F
> ;data=04%7C01%7Cbabu.moger%40amd.com%7Cd188e85e961f4d246fb208d8e
> 17901e7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6375072571
> 22356726%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=gHzm10%2B76r
> NybPHxD2Uf6OmQINMyptbsH2mXU0QYxfw%3Dreserved=0]
> 
> v2:
> 1. Dropped changes to CAT test and CMT test as they will be posted in a later
>series.
> 2. Added several other fixes
> [v1:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> nel.org%2Flinux-
> kselftest%2Fcover.1583657204.git.sai.praneeth.prakhya%40intel.com%2F
> ;data=04%7C01%7Cbabu.moger%40amd.com%7Cd188e85e961f4d246fb208d8e
> 17901e7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6375072571
> 22356726%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vQYNiVeuWVK
> v6Eo8BlE%2BhBSScWbI2bznPmbnAm6esXM%3Dreserved=0]
> 
> Fenghua Yu (19):
>   selftests/resctrl: Enable gcc checks to detect buffer overflows
>   selftests/resctrl: Fix compilation issues for global variables
>   selftests/resctrl: Fix compilation issues for other global variables
>   selftests/resctrl: Clean up resctrl features check
>   selftests/resctrl: Fix missing options "-n" and "-p"
>   selftests/resctrl: Rename CQM test as CMT test
>   selftests/resctrl: Call kselftest APIs to log test results
>   selftests/resctrl: Share show_cache_info() by CAT and CMT tests
>   selftests/resctrl: Add config dependencies
>   selftests/resctrl: Check for resctrl mount point only if resctrl FS is
> supported
>   selftests/resctrl: Use resctrl/info for feature detection
>   selftests/resctrl: Fix MBA/MBM results reporting format
>   selftests/resctrl: Don't hard code value of "no_of_bits" variable
>   selftests/resctrl: Modularize resctrl test suite main() function
>   selftests/resctrl: Skip the test if requested resctrl feature is not
> supported
>   selftests/resctrl: Fix un

RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-12 Thread Babu Moger



> -Original Message-
> From: Borislav Petkov 
> Sent: Thursday, March 11, 2021 5:52 PM
> To: Moger, Babu 
> Cc: Paolo Bonzini ; Jim Mattson
> ; Vitaly Kuznetsov ; Wanpeng
> Li ; kvm list ; Joerg Roedel
> ; the arch/x86 maintainers ; LKML  ker...@vger.kernel.org>; Ingo Molnar ; H . Peter Anvin
> ; Thomas Gleixner ; Makarand Sonare
> ; Sean Christopherson 
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> On Thu, Mar 11, 2021 at 04:15:37PM -0600, Babu Moger wrote:
> > My host is
> > # cat /etc/redhat-release
> > Red Hat Enterprise Linux release 8.3 (Ootpa) # uname -r 5.12.0-rc2+
> 
> Please upload host and guest .config.

Host config.

https://pastebin.com/wuLzEqZr

Guest config

https://pastebin.com/mvzEEq6R


Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-11 Thread Babu Moger



On 3/11/21 4:04 PM, Babu Moger wrote:
> 
> 
> On 3/11/21 3:40 PM, Borislav Petkov wrote:
>> On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>>>  It is related PCID and INVPCID combination. Few more details.
>>>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
>>> enabled.
>>
>> Which system, host or guest?
>>
>>>  2. Host is coming up fine. Problem is with the guest.
>>
>> Aha, guest.
>>
>>>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>>>  4. Debian 10 is fine.
>>>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>>>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
>>>
>>>  Let me know if want me to try something else.
>>
>> Yes, I assume host has the patches which belong to this thread?
> 
> Yes. Host has all these patches. Right now I am on 5.12.0-rc2. I just
> updated yesterday. I was able to reproduce 5.11 also.
> 
> 
>>
>> So please describe:
>>
>> 1. host has these patches, cmdline params, etc.

My host is
# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 (Ootpa)
# uname -r
5.12.0-rc2+


> 
> # cat /proc/cmdline
> BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.12.0-rc2+ root=/dev/mapper/rhel-root ro
> crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root
> rd.lvm.lv=rhel/swap ras=cec_disable nmi_watchdog=0 warn_ud2=on selinux=0
> earlyprintk=serial,ttyS1,115200n8 console=ttyS1,115200n8
> 
> 
>> 2. guest is a 4.9 kernel, cmdline params, etc.
> 
> I use qemu command line to bring up the guest. Make sure to use "-cpu host".
> 
> qemu-system-x86_64 -name deb9 -m 16384 -smp cores=16,threads=1,sockets=1
> -hda vdisk-deb.qcow2 -enable-kvm -net nic  -net
> bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu host,+svm
> -nographic
> 
> 
> The grub command line looks like this on the guest.
> 
> cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-4.9.0-14-amd64
> root=UUID=a0069240-cd60-4795-a391-273266dbae29 ro console=ttyS0,112500n8
> earlyprintk
> 


Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-11 Thread Babu Moger



On 3/11/21 3:40 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>>  It is related PCID and INVPCID combination. Few more details.
>>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
>> enabled.
> 
> Which system, host or guest?
> 
>>  2. Host is coming up fine. Problem is with the guest.
> 
> Aha, guest.
> 
>>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>>  4. Debian 10 is fine.
>>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
>>
>>  Let me know if want me to try something else.
> 
> Yes, I assume host has the patches which belong to this thread?

Yes. Host has all these patches. Right now I am on 5.12.0-rc2. I just
updated yesterday. I was able to reproduce 5.11 also.


> 
> So please describe:
> 
> 1. host has these patches, cmdline params, etc.

# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.12.0-rc2+ root=/dev/mapper/rhel-root ro
crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root
rd.lvm.lv=rhel/swap ras=cec_disable nmi_watchdog=0 warn_ud2=on selinux=0
earlyprintk=serial,ttyS1,115200n8 console=ttyS1,115200n8


> 2. guest is a 4.9 kernel, cmdline params, etc.

I use qemu command line to bring up the guest. Make sure to use "-cpu host".

qemu-system-x86_64 -name deb9 -m 16384 -smp cores=16,threads=1,sockets=1
-hda vdisk-deb.qcow2 -enable-kvm -net nic  -net
bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu host,+svm
-nographic


The grub command line looks like this on the guest.

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.9.0-14-amd64
root=UUID=a0069240-cd60-4795-a391-273266dbae29 ro console=ttyS0,112500n8
earlyprintk



Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-11 Thread Babu Moger



On 3/11/21 3:36 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 01:23:47PM -0800, Jim Mattson wrote:
>> I would expect kaiser_enabled to be false (and PCIDs not to be used),
>> since AMD CPUs are not vulnerable to Meltdown.
> 
> Ah, of course. The guest dmesg should have
> 
> "Kernel/User page tables isolation: disabled."

yes.
 #dmesg |grep isolation
[0.00] Kernel/User page tables isolation: disabled

> 
> Lemme see if I can reproduce.
> 


Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-11 Thread Babu Moger



On 3/11/21 2:32 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 09:07:55PM +0100, Borislav Petkov wrote:
>> On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
>>> # git bisect good
>>> 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
>>> commit 59094faf3f618b2d2b2a45acb916437d611cede6
>>> Author: Borislav Petkov 
>>> Date:   Mon Dec 25 13:57:16 2017 +0100
>>>
>>> x86/kaiser: Move feature detection up
>>
>> What is the reproducer?
>>
>> Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
>> .config?
>>
>> Upthread is talking about PCID, so I'm guessing host needs to be Zen3
>> with PCID. Anything else?
> 
> That oops points to:
> 
> [1.237515] kernel BUG at 
> /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> 
> which is:
> 
> local_flush_tlb();
> sync_core();
> /* Could also do a CLFLUSH here to speed up CPU recovery; but
>that causes hangs on some VIA CPUs. */
> for (i = 0; i < len; i++)
> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]); <---
> local_irq_restore(flags);
> return addr;
> 
> in text_poke() which basically says that the patching verification
> fails. And you have a local_flush_tlb() before that. And with PCID maybe
> it is not flushing properly or whatnot.
> 
> And deep down in the TLB flushing code, it does:
> 
> if (kaiser_enabled)
> kaiser_flush_tlb_on_return_to_user();
> 
> and that uses PCID...
> 
> Anyway, needs more info.

Boris,
 It is related PCID and INVPCID combination. Few more details.
 1. System comes up fine with "noinvpid". So, it happens when invpcid is
enabled.
 2. Host is coming up fine. Problem is with the guest.
 3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
 4. Debian 10 is fine.
 5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
 6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.

 Let me know if want me to try something else.
thanks
Babu




Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-10 Thread Babu Moger



On 3/10/21 9:31 AM, Paolo Bonzini wrote:
> On 10/03/21 15:58, Babu Moger wrote:
>> There is no upstream version 4.9.258.
> 
> Sure there is, check out
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.kernel.org%2Fpub%2Flinux%2Fkernel%2Fv4.x%2Fdata=04%7C01%7Cbabu.moger%40amd.com%7Caeefc58416ed490faa7f08d8e3d99d72%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637509871127634618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=re2Jj5P7IjN2UdmPTjTuKd1KIJLek84KlcnsXxgKYRc%3Dreserved=0
> 
> 
> The easiest way to do it is to bisect on the linux-4.9.y branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git.
> 
Paolo, Thanks for pointing that out. Bisected linux-4.9.y branch.
It is pointing at

# git bisect good
59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
commit 59094faf3f618b2d2b2a45acb916437d611cede6
Author: Borislav Petkov 
Date:   Mon Dec 25 13:57:16 2017 +0100

x86/kaiser: Move feature detection up


... before the first use of kaiser_enabled as otherwise funky
things happen:

  about to get started...
  (XEN) d0v0 Unhandled page fault fault/trap [#14, ec=]
  (XEN) Pagetable walk from 88022a449090:
  (XEN)  L4[0x110] = 000229e0e067 1e0e
  (XEN)  L3[0x008] =  
  (XEN) domain_crash_sync called from entry.S: fault at 82d08033fd08
  entry.o#create_bounce_frame+0x135/0x14d
  (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
  (XEN) [ Xen-4.9.1_02-3.21  x86_64  debug=n   Not tainted ]
  (XEN) CPU:0
  (XEN) RIP:e033:[]
  (XEN) RFLAGS: 0286   EM: 1   CONTEXT: pv guest (d0v0)

Signed-off-by: Borislav Petkov 
Signed-off-by: Greg Kroah-Hartman 

:04 04 e56bbc975c3fd1a774b6cc0d6699c0c24e66be1c
e06231dccc8589b4baa0cd5759a37899b7ec71c1 March

Not sure what is going on with this commit. Still looking.


Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-10 Thread Babu Moger



On 3/10/21 8:55 AM, Babu Moger wrote:
> 
> 
>> -Original Message-
>> From: Paolo Bonzini 
>> Sent: Wednesday, March 10, 2021 3:09 AM
>> To: Moger, Babu ; Jim Mattson
>> 
>> Cc: Vitaly Kuznetsov ; Wanpeng Li
>> ; kvm list ; Joerg Roedel
>> ; the arch/x86 maintainers ; LKML > ker...@vger.kernel.org>; Ingo Molnar ; Borislav Petkov
>> ; H . Peter Anvin ; Thomas Gleixner
>> ; Makarand Sonare ; Sean
>> Christopherson 
>> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
>>
>> On 10/03/21 02:04, Babu Moger wrote:
>>> Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears
>>> the problem is on Debian 4.9 kernel. I am not sure how to run git
>>> bisect on Debian kernel. Tried anyway. It is pointing to
>>>
>>> 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
>>> commit 47811c66356d875e76a6ca637a9d384779a659bb
>>> Author: Ben Hutchings
>>> Date:   Mon Mar 8 01:17:32 2021 +0100
>>>
>>>  Prepare to release linux (4.9.258-1).
>>>
>>> It does not appear to be the right commit. I am out of ideas now.
>>> hanks
>>
>> Have you tried bisecting the upstream stable kernels (from 4.9.0 to 4.9.258)?

I couldn't reproduce the issue on any of the upstream versions. I have
tried v4.9, v4.8 and even on latest v5.11. No issues there. There is no
upstream version 4.9.258.

Jim mentioned Debian 10 which is based of kernel version 4.19 is also
fine. Issue appears to be only affecting  Debian 9(kernel v4.9.0-14).



RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-10 Thread Babu Moger



> -Original Message-
> From: Paolo Bonzini 
> Sent: Wednesday, March 10, 2021 3:09 AM
> To: Moger, Babu ; Jim Mattson
> 
> Cc: Vitaly Kuznetsov ; Wanpeng Li
> ; kvm list ; Joerg Roedel
> ; the arch/x86 maintainers ; LKML  ker...@vger.kernel.org>; Ingo Molnar ; Borislav Petkov
> ; H . Peter Anvin ; Thomas Gleixner
> ; Makarand Sonare ; Sean
> Christopherson 
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> On 10/03/21 02:04, Babu Moger wrote:
> > Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears
> > the problem is on Debian 4.9 kernel. I am not sure how to run git
> > bisect on Debian kernel. Tried anyway. It is pointing to
> >
> > 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
> > commit 47811c66356d875e76a6ca637a9d384779a659bb
> > Author: Ben Hutchings
> > Date:   Mon Mar 8 01:17:32 2021 +0100
> >
> >  Prepare to release linux (4.9.258-1).
> >
> > It does not appear to be the right commit. I am out of ideas now.
> > hanks
> 
> Have you tried bisecting the upstream stable kernels (from 4.9.0 to 4.9.258)?

I couldn't reproduce the issue on any of the upstream versions. I have
tried v4.9, v4.8 and even on latest v5.11. No issues there.

Jim mentioned Debian 10 which is based of kernel version 4.19 is also
fine. Issue appears to be only affecting  Debian 9(kernel v4.9.0-14).


RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-03-09 Thread Babu Moger



> -Original Message-
> From: Babu Moger 
> Sent: Wednesday, February 24, 2021 4:17 PM
> To: Jim Mattson 
> Cc: Paolo Bonzini ; Vitaly Kuznetsov
> ; Wanpeng Li ; kvm list
> ; Joerg Roedel ; the arch/x86
> maintainers ; LKML ; Ingo
> Molnar ; Borislav Petkov ; H . Peter
> Anvin ; Thomas Gleixner ; Makarand
> Sonare ; Sean Christopherson
> 
> Subject: RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> 
> 
> > -Original Message-
> > From: Jim Mattson 
> > Sent: Tuesday, February 23, 2021 6:14 PM
> > To: Moger, Babu 
> > Cc: Paolo Bonzini ; Vitaly Kuznetsov
> > ; Wanpeng Li ; kvm list
> > ; Joerg Roedel ; the arch/x86
> > maintainers ; LKML ;
> > Ingo Molnar ; Borislav Petkov ; H .
> > Peter Anvin ; Thomas Gleixner ;
> > Makarand Sonare ; Sean Christopherson
> > 
> > Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> >
> > Any updates? What should we be telling customers with Debian 9 guests?
> > :-)
> 
> Found another problem with pcid feature om SVM. It is do with CR4 flags
> reset during bootup. Problem was showing up with kexec loading on VM.
> I am not sure if this is related to that. Will send the patch soon.

Tried to reproduce the problem on upstream kernel versions without any
success.  Tried v4.9-0 and v4.8-0. Both these upstream versions are
working fine. So "git bisect" on upstream is ruled out.

Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears the
problem is on Debian 4.9 kernel. I am not sure how to run git bisect on
Debian kernel. Tried anyway. It is pointing to

47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
commit 47811c66356d875e76a6ca637a9d384779a659bb
Author: Ben Hutchings 
Date:   Mon Mar 8 01:17:32 2021 +0100

    Prepare to release linux (4.9.258-1).

It does not appear to be the right commit. I am out of ideas now.
hanks
Babu

> 
> >
> > On Fri, Jan 22, 2021 at 5:52 PM Babu Moger 
> wrote:
> > >
> > >
> > >
> > > On 1/21/21 5:51 PM, Babu Moger wrote:
> > > >
> > > >
> > > > On 1/20/21 9:10 PM, Babu Moger wrote:
> > > >>
> > > >>
> > > >> On 1/20/21 3:45 PM, Babu Moger wrote:
> > > >>>
> > > >>>
> > > >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> > > >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger
> 
> > wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > > >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger
> >  wrote:
> > > >>>>>>
> > > >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests.
> > > >>>>>>> Everything works as expected.
> > > >>>>>>
> > > >>>>>> Debian 9 does not like this patch set. As a kvm guest, it
> > > >>>>>> panics on a Milan CPU unless booted with 'nopcid'. Gmail
> > > >>>>>> mangles long lines, so please see the attached kernel log
> > > >>>>>> snippet. Debian 10 is fine, so I assume this is a guest bug.
> > > >>>>>>
> > > >>>>>
> > > >>>>> We had an issue with PCID feature earlier. This was showing
> > > >>>>> only with SEV guests. It is resolved recently. Do you think it
> > > >>>>> is not related
> > that?
> > > >>>>> Here are the patch set.
> > > >>>>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2
> > > >>>>> F%25
> > > >>>>>
> > 2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996
> > > >>>>> .stgit%40bmoger-
> > ubuntu%2Fdata=04%7C01%7Cbabu.moger%40amd.co
> > > >>>>>
> >
> m%7C9558672ca21c4f6c2d5308d8d85919dc%7C3dd8961fe4884e608e11a82d9
> > > >>>>>
> >
> 94e183d%7C0%7C0%7C637497224490455772%7CUnknown%7CTWFpbGZsb3d
> 8
> > eyJ
> > > >>>>>
> >
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > > >>>>>
> >
> 7C1000sdata=4QzTNHaYllwPd1U0kumq75dpwp7Rg0ZXsSQ631jMeqs%
> 3D
> > &
> > > >>>>> amp;reserved=0
> > > >>>>
> > > >>>> The Debian 9 release we tested is not an SEV

Re: [PATCH] KVM: SVM: Clear the CR4 register on reset

2021-03-02 Thread Babu Moger



On 3/2/21 1:20 PM, Sean Christopherson wrote:
> On Tue, Mar 02, 2021, Babu Moger wrote:
>> This problem was reported on a SVM guest while executing kexec.
>> Kexec fails to load the new kernel when the PCID feature is enabled.
>>
>> When kexec starts loading the new kernel, it starts the process by
>> resetting the vCPU's and then bringing each vCPU online one by one.
>> The vCPU reset is supposed to reset all the register states before the
>> vCPUs are brought online. However, the CR4 register is not reset during
>> this process. If this register is already setup during the last boot,
>> all the flags can remain intact. The X86_CR4_PCIDE bit can only be
>> enabled in long mode. So, it must be enabled much later in SMP
>> initialization.  Having the X86_CR4_PCIDE bit set during SMP boot can
>> cause a boot failures.
>>
>> Fix the issue by resetting the CR4 register in init_vmcb().
>>
>> Signed-off-by: Babu Moger 
> 
> Cc: sta...@vger.kernel.org
> 
> The bug goes back too far to have a meaningful Fixes.
> 
> Reviewed-by: Sean Christopherson 

Sean, Thanks
> 
> 
> On a related topic, I think we can clean up the RESET/INIT flows by hoisting 
> the
> common code into kvm_vcpu_reset().  That would also provide good motivation 
> for
> removing the init_vmcb() call in svm_create_vcpu(), which is fully redundant
> with the call in svm_vcpu_reset().  I'll put that on the todo list.

Yes.Please.Thought about cleaning init_vmcb and svm_vcpu_reset little bit.
That will require some more tests and review. We didn't want to delay the
fix for that now.
Thanks
Babu


[PATCH] KVM: SVM: Clear the CR4 register on reset

2021-03-02 Thread Babu Moger
This problem was reported on a SVM guest while executing kexec.
Kexec fails to load the new kernel when the PCID feature is enabled.

When kexec starts loading the new kernel, it starts the process by
resetting the vCPU's and then bringing each vCPU online one by one.
The vCPU reset is supposed to reset all the register states before the
vCPUs are brought online. However, the CR4 register is not reset during
this process. If this register is already setup during the last boot,
all the flags can remain intact. The X86_CR4_PCIDE bit can only be
enabled in long mode. So, it must be enabled much later in SMP
initialization.  Having the X86_CR4_PCIDE bit set during SMP boot can
cause a boot failures.

Fix the issue by resetting the CR4 register in init_vmcb().

Signed-off-by: Babu Moger 
---
 arch/x86/kvm/svm/svm.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c636021b066b..baee91c1e936 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1200,6 +1200,7 @@ static void init_vmcb(struct vcpu_svm *svm)
init_sys_seg(>ldtr, SEG_TYPE_LDT);
init_sys_seg(>tr, SEG_TYPE_BUSY_TSS16);
 
+   svm_set_cr4(>vcpu, 0);
svm_set_efer(>vcpu, 0);
save->dr6 = 0x0ff0;
kvm_set_rflags(>vcpu, X86_EFLAGS_FIXED);



RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-02-24 Thread Babu Moger



> -Original Message-
> From: Jim Mattson 
> Sent: Tuesday, February 23, 2021 6:14 PM
> To: Moger, Babu 
> Cc: Paolo Bonzini ; Vitaly Kuznetsov
> ; Wanpeng Li ; kvm list
> ; Joerg Roedel ; the arch/x86
> maintainers ; LKML ; Ingo
> Molnar ; Borislav Petkov ; H . Peter Anvin
> ; Thomas Gleixner ; Makarand Sonare
> ; Sean Christopherson 
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> Any updates? What should we be telling customers with Debian 9 guests? :-)

Found another problem with pcid feature om SVM. It is do with CR4 flags
reset during bootup. Problem was showing up with kexec loading on VM.
I am not sure if this is related to that. Will send the patch soon.

> 
> On Fri, Jan 22, 2021 at 5:52 PM Babu Moger  wrote:
> >
> >
> >
> > On 1/21/21 5:51 PM, Babu Moger wrote:
> > >
> > >
> > > On 1/20/21 9:10 PM, Babu Moger wrote:
> > >>
> > >>
> > >> On 1/20/21 3:45 PM, Babu Moger wrote:
> > >>>
> > >>>
> > >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> > >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger 
> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger
>  wrote:
> > >>>>>>
> > >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests.
> > >>>>>>> Everything works as expected.
> > >>>>>>
> > >>>>>> Debian 9 does not like this patch set. As a kvm guest, it
> > >>>>>> panics on a Milan CPU unless booted with 'nopcid'. Gmail
> > >>>>>> mangles long lines, so please see the attached kernel log
> > >>>>>> snippet. Debian 10 is fine, so I assume this is a guest bug.
> > >>>>>>
> > >>>>>
> > >>>>> We had an issue with PCID feature earlier. This was showing only
> > >>>>> with SEV guests. It is resolved recently. Do you think it is not 
> > >>>>> related
> that?
> > >>>>> Here are the patch set.
> > >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> > >>>>>
> 2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996
> > >>>>> .stgit%40bmoger-
> ubuntu%2Fdata=04%7C01%7Cbabu.moger%40amd.co
> > >>>>>
> m%7C9558672ca21c4f6c2d5308d8d85919dc%7C3dd8961fe4884e608e11a82d9
> > >>>>>
> 94e183d%7C0%7C0%7C637497224490455772%7CUnknown%7CTWFpbGZsb3d8
> eyJ
> > >>>>>
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > >>>>>
> 7C1000sdata=4QzTNHaYllwPd1U0kumq75dpwp7Rg0ZXsSQ631jMeqs%3D
> &
> > >>>>> amp;reserved=0
> > >>>>
> > >>>> The Debian 9 release we tested is not an SEV guest.
> > >>> ok. I have not tested Debian 9 before. I will try now. Will let
> > >>> you know how it goes. thanks
> > >>>
> > >>
> > >> I have reproduced the issue locally. Will investigate. thanks
> > >>
> > > Few updates.
> > > 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> > > Debian 9 runs the base kernel 4.9.0-14. Problem can be seen
> > > consistently with this kernel.
> > >
> > > 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> > >
> > > 3. System comes up fine when invpcid feature is disabled with the
> > > boot parameter "noinvpcid" and also with "nopcid". nopcid disables
> > > both pcid and invpcid.
> > >
> > > 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> > >
> > > 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> > >
> > > I did not bisect further yet.
> > > Babu
> > > Thanks
> >
> >
> > Some more update:
> >  System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
> > So, I am assuming this is something specific to Debian 4.9.0-14 kernel.
> >
> > Note: I couldn't go back prior versions(v4.8 or earlier) due to
> > compile issues.
> > Thanks
> > Babu
> >


Re: [PATCH v4 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-02-11 Thread Babu Moger



On 2/11/21 2:56 AM, Paolo Bonzini wrote:
> On 29/01/21 01:43, Babu Moger wrote:
>> This support also fixes an issue where a guest may sometimes see an
>> inconsistent value for the SPEC_CTRL MSR on processors that support this
>> feature. With the current SPEC_CTRL support, the first write to
>> SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
>> MSR is not updated.
> 
> This is a bit ugly, new features should always be enabled manually (AMD
> did it right for vVMLOAD/vVMSAVE for example, even though _in theory_
> assuming that all hypervisors were intercepting VMLOAD/VMSAVE would have
> been fine).
> 
> Also regarding nested virtualization:
> 
>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>> index 7a605ad8254d..9e51f9e4f631 100644
>> --- a/arch/x86/kvm/svm/nested.c
>> +++ b/arch/x86/kvm/svm/nested.c
>> @@ -534,6 +534,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
>>  hsave->save.cr3    = vmcb->save.cr3;
>>  else
>>  hsave->save.cr3    = kvm_read_cr3(>vcpu);
>> +    hsave->save.spec_ctrl = vmcb->save.spec_ctrl;
>>  
>>  copy_vmcb_control_area(>control, >control);
>>  
>> @@ -675,6 +676,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>>  kvm_rip_write(>vcpu, hsave->save.rip);
>>  svm->vmcb->save.dr7 = DR7_FIXED_1;
>>  svm->vmcb->save.cpl = 0;
>> +    svm->vmcb->save.spec_ctrl = hsave->save.spec_ctrl;
>>  svm->vmcb->control.exit_int_info = 0;
>>  
>>  vmcb_mark_all_dirty(svm->vmcb);
> 
> I think this is incorrect.  Since we don't support this feature in the
> nested hypervisor, any writes to the SPEC_CTRL MSR while L2 (nested guest)
> runs have to be reflected to L1 (nested hypervisor).  In other words, this
> new field is more like VMLOAD/VMSAVE state, in that it doesn't change
> across VMRUN and VMEXIT.  These two hunks can be removed.

Makes sense. I have tested removing these two hunks and it worked fine.

> 
> If we want to do it, exposing this feature to the nested hypervisor will
> be a bit complicated, because one has to write host SPEC_CTRL |
> vmcb01.GuestSpecCtrl in the host MSR, in order to free the vmcb02
> GuestSpecCtrl for the vmcb12 GuestSpecCtrl.
> 
> It would also be possible to emulate it on processors that don't have it. 
> However I'm not sure it's a good idea because of the problem that you
> mentioned with running old kernels on new processors.
> 
> I have queued the patches with the small fix above.  However I plan to
> only include them in 5.13 because I have a bunch of other SVM patches,

Yes. 5.13 is fine.
thanks
Babu

> those have been tested already but I need to send them out for review
> before "officially" getting them in kvm.git.
> 
> Paolo
> 


Re: [PATCH v4 0/2] x86: Add the feature Virtual SPEC_CTRL

2021-02-10 Thread Babu Moger
Paolo/Sean,
Any comments on these patches?
Thanks
Babu

On 1/28/21 6:43 PM, Babu Moger wrote:
> Newer AMD processors have a feature to virtualize the use of the
> SPEC_CTRL MSR on the guest. The series adds the feature support
> and enables the feature on SVM.
> ---
> v4:
>   1. Taken care of comments from Sean Christopherson.
>  a. Updated svm_set_msr/svm_get_msr to read/write the spec_ctrl value
> directly from save spec_ctrl.
>  b. Disabled the msr_interception in init_vmcb when V_SPEC_CTRL is
> present.
>  c. Added the save restore for nested vm. Also tested to make sure
> the nested SPEC_CTRL settings properly saved and restored between
> L2 and L1 guests.
>   2. Added the kvm-unit-tests to verify that. Sent those patches separately.
> 
> v3:
>   1. Taken care of recent changes in vmcb_save_area. Needed to adjust the save
>  area spec_ctrl definition.
>   2. Taken care of few comments from Tom.
>  a. Initialised the save area spec_ctrl in case of SEV-ES.
>  b. Removed the changes in svm_get_msr/svm_set_msr.
>  c. Reverted the changes to disable the msr interception to avoid 
> compatibility
> issue.
>   3. Updated the patch #1 with Acked-by from Boris.
>   
> v2:
>   NOTE: This is not final yet. Sending out the patches to make
>   sure I captured all the comments correctly.
> 
>   1. Most of the changes are related to Jim and Sean's feedback.
>   2. Improved the description of patch #2.
>   3. Updated the vmcb save area's guest spec_ctrl value(offset 0x2E0)
>  properly. Initialized during init_vmcb and svm_set_msr and
>  returned the value from save area for svm_get_msr.
>   4. As Jim commented, transferred the value into the VMCB prior
>  to VMRUN and out of the VMCB after #VMEXIT.
>   5. Added kvm-unit-test to detect the SPEC CTRL feature.
>  
> https://lore.kernel.org/kvm/160865324865.19910.5159218511905134908.stgit@bmoger-ubuntu/
>   6. Sean mantioned of renaming MSR_AMD64_VIRT_SPEC_CTRL. But, it might
>  create even more confusion, so dropped the idea for now.
> 
> v3: 
> https://lore.kernel.org/kvm/161073115461.13848.18035972823733547803.stgit@bmoger-ubuntu/
> v2: 
> https://lore.kernel.org/kvm/160867624053.3471.7106539070175910424.stgit@bmoger-ubuntu/
> v1: 
> https://lore.kernel.org/kvm/160738054169.28590.5171339079028237631.stgit@bmoger-ubuntu/
> 
> Babu Moger (2):
>   x86/cpufeatures: Add the Virtual SPEC_CTRL feature
>   KVM: SVM: Add support for Virtual SPEC_CTRL
> 
> 
>  arch/x86/include/asm/cpufeatures.h |1 +
>  arch/x86/include/asm/svm.h |4 +++-
>  arch/x86/kvm/svm/nested.c  |2 ++
>  arch/x86/kvm/svm/svm.c |   27 ++-
>  4 files changed, 28 insertions(+), 6 deletions(-)
> 
> --
> 


[PATCH v4 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-01-28 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
function 0x800A_EDX[20]: GuestSpecCtrl. Hypervisors are not
required to enable this feature since it is automatically enabled on
processors that support it.

A hypervisor may wish to impose speculation controls on guest
execution or a guest may want to impose its own speculation controls.
Therefore, the processor implements both host and guest
versions of SPEC_CTRL.

When in host mode, the host SPEC_CTRL value is in effect and writes
update only the host version of SPEC_CTRL. On a VMRUN, the processor
loads the guest version of SPEC_CTRL from the VMCB. When the guest
writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
the guest version is saved into the VMCB and the processor returns
to only using the host SPEC_CTRL for speculation control. The guest
SPEC_CTRL is located at offset 0x2E0 in the VMCB.

The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
ensure a minimum SPEC_CTRL if desired.

This support also fixes an issue where a guest may sometimes see an
inconsistent value for the SPEC_CTRL MSR on processors that support
this feature. With the current SPEC_CTRL support, the first write to
SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
will be 0x0, instead of the actual expected value. There isn’t a
security concern here, because the host SPEC_CTRL value is or’ed with
the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
MSR just before the VMRUN, so it will always have the actual value
even though it doesn’t appear that way in the guest. The guest will
only see the proper value for the SPEC_CTRL register if the guest was
to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
support, the save area spec_ctrl is properly saved and restored.
So, the guest will always see the proper value when it is read back.

Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/svm.h |4 +++-
 arch/x86/kvm/svm/nested.c  |2 ++
 arch/x86/kvm/svm/svm.c |   27 ++-
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1c561945b426..772e60efe243 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -269,7 +269,9 @@ struct vmcb_save_area {
 * SEV-ES guests when referenced through the GHCB or for
 * saving to the host save area.
 */
-   u8 reserved_7[80];
+   u8 reserved_7[72];
+   u32 spec_ctrl;  /* Guest version of SPEC_CTRL at 0x2E0 */
+   u8 reserved_7b[4];
u32 pkru;
u8 reserved_7a[20];
u64 reserved_8; /* rax already available at 0x01f8 */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7a605ad8254d..9e51f9e4f631 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -534,6 +534,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
hsave->save.cr3= vmcb->save.cr3;
else
hsave->save.cr3= kvm_read_cr3(>vcpu);
+   hsave->save.spec_ctrl = vmcb->save.spec_ctrl;
 
copy_vmcb_control_area(>control, >control);
 
@@ -675,6 +676,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
kvm_rip_write(>vcpu, hsave->save.rip);
svm->vmcb->save.dr7 = DR7_FIXED_1;
svm->vmcb->save.cpl = 0;
+   svm->vmcb->save.spec_ctrl = hsave->save.spec_ctrl;
svm->vmcb->control.exit_int_info = 0;
 
vmcb_mark_all_dirty(svm->vmcb);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f923e14e87df..756129caa611 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1244,6 +1244,14 @@ static void init_vmcb(struct vcpu_svm *svm)
 
svm_check_invpcid(svm);
 
+   /*
+* If the host supports V_SPEC_CTRL then disable the interception
+* of MSR_IA32_SPEC_CTRL.
+*/
+   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   set_msr_interception(>vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL,
+1, 1);
+
if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
@@ -2678,7 +2686,10 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
!guest_has_spec_ctrl_msr(vcpu))
return 1;
 
-   msr_info->data = svm->spec_ctrl;
+   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   msr_info->data = svm->vmcb->save.spec_ctrl;
+   else
+   msr_info->data = svm->spec_ctrl;
break;

[PATCH v4 1/2] x86/cpufeatures: Add the Virtual SPEC_CTRL feature

2021-01-28 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
function 0x800A_EDX[20]: GuestSpecCtrl. When present, the
SPEC_CTRL MSR is automatically virtualized.

Signed-off-by: Babu Moger 
Acked-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..3fcd0624b1bc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -337,6 +337,7 @@
 #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
Controller */
 #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_V_SPEC_CTRL(15*32+20) /* Virtual SPEC_CTRL 
*/
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/



[PATCH v4 0/2] x86: Add the feature Virtual SPEC_CTRL

2021-01-28 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR on the guest. The series adds the feature support
and enables the feature on SVM.
---
v4:
  1. Taken care of comments from Sean Christopherson.
 a. Updated svm_set_msr/svm_get_msr to read/write the spec_ctrl value
directly from save spec_ctrl.
 b. Disabled the msr_interception in init_vmcb when V_SPEC_CTRL is
present.
 c. Added the save restore for nested vm. Also tested to make sure
the nested SPEC_CTRL settings properly saved and restored between
L2 and L1 guests.
  2. Added the kvm-unit-tests to verify that. Sent those patches separately.

v3:
  1. Taken care of recent changes in vmcb_save_area. Needed to adjust the save
 area spec_ctrl definition.
  2. Taken care of few comments from Tom.
 a. Initialised the save area spec_ctrl in case of SEV-ES.
 b. Removed the changes in svm_get_msr/svm_set_msr.
 c. Reverted the changes to disable the msr interception to avoid 
compatibility
issue.
  3. Updated the patch #1 with Acked-by from Boris.
  
v2:
  NOTE: This is not final yet. Sending out the patches to make
  sure I captured all the comments correctly.

  1. Most of the changes are related to Jim and Sean's feedback.
  2. Improved the description of patch #2.
  3. Updated the vmcb save area's guest spec_ctrl value(offset 0x2E0)
 properly. Initialized during init_vmcb and svm_set_msr and
 returned the value from save area for svm_get_msr.
  4. As Jim commented, transferred the value into the VMCB prior
 to VMRUN and out of the VMCB after #VMEXIT.
  5. Added kvm-unit-test to detect the SPEC CTRL feature.
 
https://lore.kernel.org/kvm/160865324865.19910.5159218511905134908.stgit@bmoger-ubuntu/
  6. Sean mantioned of renaming MSR_AMD64_VIRT_SPEC_CTRL. But, it might
 create even more confusion, so dropped the idea for now.

v3: 
https://lore.kernel.org/kvm/161073115461.13848.18035972823733547803.stgit@bmoger-ubuntu/
v2: 
https://lore.kernel.org/kvm/160867624053.3471.7106539070175910424.stgit@bmoger-ubuntu/
v1: 
https://lore.kernel.org/kvm/160738054169.28590.5171339079028237631.stgit@bmoger-ubuntu/

Babu Moger (2):
  x86/cpufeatures: Add the Virtual SPEC_CTRL feature
  KVM: SVM: Add support for Virtual SPEC_CTRL


 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/svm.h |4 +++-
 arch/x86/kvm/svm/nested.c  |2 ++
 arch/x86/kvm/svm/svm.c |   27 ++-
 4 files changed, 28 insertions(+), 6 deletions(-)

--


Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-01-22 Thread Babu Moger



On 1/21/21 5:51 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 9:10 PM, Babu Moger wrote:
>>
>>
>> On 1/20/21 3:45 PM, Babu Moger wrote:
>>>
>>>
>>> On 1/20/21 3:14 PM, Jim Mattson wrote:
>>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger  wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger  wrote:
>>>>>>
>>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything 
>>>>>>> works
>>>>>>> as expected.
>>>>>>
>>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>>>> assume this is a guest bug.
>>>>>>
>>>>>
>>>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>>>> guests. It is resolved recently. Do you think it is not related that?
>>>>> Here are the patch set.
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2Fdata=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3Dreserved=0
>>>>
>>>> The Debian 9 release we tested is not an SEV guest.
>>> ok. I have not tested Debian 9 before. I will try now. Will let you know
>>> how it goes. thanks
>>>
>>
>> I have reproduced the issue locally. Will investigate. thanks
>>
> Few updates.
> 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
> with this kernel.
> 
> 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> 
> 3. System comes up fine when invpcid feature is disabled with the boot
> parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
> and invpcid.
> 
> 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> 
> 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> 
> I did not bisect further yet.
> Babu
> Thanks


Some more update:
 System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
So, I am assuming this is something specific to Debian 4.9.0-14 kernel.

Note: I couldn't go back prior versions(v4.8 or earlier) due to compile
issues.
Thanks
Babu



Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-01-21 Thread Babu Moger



On 1/20/21 9:10 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 3:45 PM, Babu Moger wrote:
>>
>>
>> On 1/20/21 3:14 PM, Jim Mattson wrote:
>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger  wrote:
>>>>
>>>>
>>>>
>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger  wrote:
>>>>>
>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>>> as expected.
>>>>>
>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>>> assume this is a guest bug.
>>>>>
>>>>
>>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>>> guests. It is resolved recently. Do you think it is not related that?
>>>> Here are the patch set.
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2Fdata=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3Dreserved=0
>>>
>>> The Debian 9 release we tested is not an SEV guest.
>> ok. I have not tested Debian 9 before. I will try now. Will let you know
>> how it goes. thanks
>>
> 
> I have reproduced the issue locally. Will investigate. thanks
> 
Few updates.
1. Like Jim mentioned earlier, this appears to be guest kernel issue.
Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
with this kernel.

2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.

3. System comes up fine when invpcid feature is disabled with the boot
parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
and invpcid.

4. Upgraded the guest kernel to v5.0 and system comes up fine.

5. Also system comes up fine with latest guest kernel 5.11.0-rc4.

I did not bisect further yet.
Babu
Thanks


Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-01-20 Thread Babu Moger



On 1/20/21 3:45 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 3:14 PM, Jim Mattson wrote:
>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger  wrote:
>>>
>>>
>>>
>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger  wrote:
>>>>
>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>> as expected.
>>>>
>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>> assume this is a guest bug.
>>>>
>>>
>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>> guests. It is resolved recently. Do you think it is not related that?
>>> Here are the patch set.
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2Fdata=04%7C01%7CBabu.Moger%40amd.com%7C562d8b8ea61c41a61fe608d8bda0ae3b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467845105800757%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=l%2FhF%2FlDAqFN10SzDQ1L05FH1joXrLiuMwHAibBGHOqw%3Dreserved=0
>>
>> The Debian 9 release we tested is not an SEV guest.
> ok. I have not tested Debian 9 before. I will try now. Will let you know
> how it goes. thanks
> 

I have reproduced the issue locally. Will investigate. thanks


Re: [PATCH v3 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-01-20 Thread Babu Moger



On 1/19/21 5:45 PM, Sean Christopherson wrote:
> On Tue, Jan 19, 2021, Babu Moger wrote:
>>
>> On 1/19/21 12:31 PM, Sean Christopherson wrote:
>>> On Fri, Jan 15, 2021, Babu Moger wrote:
>>>> @@ -3789,7 +3792,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
>>>> kvm_vcpu *vcpu)
>>>> * is no need to worry about the conditional branch over the wrmsr
>>>> * being speculatively taken.
>>>> */
>>>> -  x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
>>>> +  if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>>>> +  svm->vmcb->save.spec_ctrl = svm->spec_ctrl;
>>>> +  else
>>>> +  x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
>>>
>>> Can't we avoid functional code in svm_vcpu_run() entirely when V_SPEC_CTRL 
>>> is
>>> supported?  Make this code a nop, disable interception from time zero, and
>>
>> Sean, I thought you mentioned earlier about not changing the interception
>> mechanism.
> 
> I assume you're referring to this comment?
> 
>   On Mon, Dec 7, 2020 at 3:13 PM Sean Christopherson  
> wrote:
>   >
>   > On Mon, Dec 07, 2020, Babu Moger wrote:
>   > > When this feature is enabled, the hypervisor no longer has to
>   > > intercept the usage of the SPEC_CTRL MSR and no longer is required to
>   > > save and restore the guest SPEC_CTRL setting when switching
>   > > hypervisor/guest modes.
>   >
>   > Well, it's still required if the hypervisor wanted to allow the guest to 
> turn
>   > off mitigations that are enabled in the host.  I'd omit this entirely and 
> focus
>   > on what hardware does and how Linux/KVM utilize the new feature.
> 
> I wasn't suggesting that KVM should intercept SPEC_CTRL, I was pointing out 
> that
> there exists a scenario where a hypervisor would need/want to intercept
> SPEC_CTRL, and that stating that a hypervisor is/isn't required to do 
> something
> isn't helpful in a KVM/Linux changelog because it doesn't describe the actual
> change, nor does it help understand _why_ the change is correct.

Ok. Got it.

> 
>> Do you think we should disable the interception right away if V_SPEC_CTRL is
>> supported?
> 
> Yes, unless I'm missing an interaction somewhere, that will simplify the 
> get/set
> flows as they won't need to handle the case where the MSR is intercepted when
> V_SPEC_CTRL is supported.  If the MSR is conditionally passed through, the get
> flow would need to check if the MSR is intercepted to determine whether
> svm->spec_ctrl or svm->vmcb->save.spec_ctrl holds the guest's value.

Ok. Sure.

> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cce0143a6f80..40f1bd449cfa 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2678,7 +2678,10 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
> !guest_has_spec_ctrl_msr(vcpu))
> return 1;
>  
> -   msr_info->data = svm->spec_ctrl;
> +   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> +   msr_info->data = svm->vmcb->save.spec_ctrl;
> +   else
> +   msr_info->data = svm->spec_ctrl;
> break;
> case MSR_AMD64_VIRT_SPEC_CTRL:
> if (!msr_info->host_initiated &&
> @@ -2779,6 +2782,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr)
> if (kvm_spec_ctrl_test_value(data))
> return 1;
>  
> +   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)) {
> +   svm->vmcb->save.spec_ctrl = data;
> +   break;
> +   }
> +
> svm->spec_ctrl = data;
> if (!data)
> break;
> 
>>> read/write the VMBC field in svm_{get,set}_msr().  I.e. don't touch
>>> svm->spec_ctrl if V_SPEC_CTRL is supported.  

Sure. Will make these changes.

>  
> Potentially harebrained alternative...
> 
> From an architectural SVM perspective, what are the rules for VMCB fields that
> don't exist (on the current hardware)?  E.g. are they reserved MBZ?  If not,
> does the SVM architecture guarantee that reserved fields will not be modified?
> I couldn't (quickly) find anything in the APM that explicitly states what
> happens with defined-but-not-existent fields.

I checked with our hardware design team about this. They dont want
software to make any assumptions about th

Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-01-20 Thread Babu Moger



On 1/20/21 3:14 PM, Jim Mattson wrote:
> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger  wrote:
>>
>>
>>
>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger  wrote:
>>>
>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>> as expected.
>>>
>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>> assume this is a guest bug.
>>>
>>
>> We had an issue with PCID feature earlier. This was showing only with SEV
>> guests. It is resolved recently. Do you think it is not related that?
>> Here are the patch set.
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2Fdata=04%7C01%7Cbabu.moger%40amd.com%7C507e52200cc5478e3b9308d8bd8860bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467740754159704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Nxhg4Atzr6wZ1L7egyxQVZ%2FmVCE473%2F%2F5Fi0savgUfk%3Dreserved=0
> 
> The Debian 9 release we tested is not an SEV guest.
ok. I have not tested Debian 9 before. I will try now. Will let you know
how it goes. thanks


Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2021-01-19 Thread Babu Moger



On 1/19/21 5:01 PM, Jim Mattson wrote:
> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger  wrote:
> 
>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>> as expected.
> 
> Debian 9 does not like this patch set. As a kvm guest, it panics on a
> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
> please see the attached kernel log snippet. Debian 10 is fine, so I
> assume this is a guest bug.
> 

We had an issue with PCID feature earlier. This was showing only with SEV
guests. It is resolved recently. Do you think it is not related that?
Here are the patch set.
https://lore.kernel.org/kvm/160521930597.32054.4906933314022910996.stgit@bmoger-ubuntu/



Re: [PATCH v3 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-01-19 Thread Babu Moger



On 1/19/21 12:31 PM, Sean Christopherson wrote:
> On Fri, Jan 15, 2021, Babu Moger wrote:
>> ---
>>  arch/x86/include/asm/svm.h |4 +++-
>>  arch/x86/kvm/svm/sev.c |4 
>>  arch/x86/kvm/svm/svm.c |   19 +++
>>  3 files changed, 22 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> index 1c561945b426..772e60efe243 100644
>> --- a/arch/x86/include/asm/svm.h
>> +++ b/arch/x86/include/asm/svm.h
>> @@ -269,7 +269,9 @@ struct vmcb_save_area {
>>   * SEV-ES guests when referenced through the GHCB or for
>>   * saving to the host save area.
>>   */
>> -u8 reserved_7[80];
>> +u8 reserved_7[72];
>> +u32 spec_ctrl;  /* Guest version of SPEC_CTRL at 0x2E0 */
>> +u8 reserved_7b[4];
> 
> Don't nested_prepare_vmcb_save() and nested_vmcb_checks() need to be updated 
> to
> handle the new field, too?

Ok. Sure. I will check and test few combinations to make sure of these
changes.

> 
>>  u32 pkru;
>>  u8 reserved_7a[20];
>>  u64 reserved_8; /* rax already available at 0x01f8 */
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index c8ffdbc81709..959d6e47bd84 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -546,6 +546,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>>  save->pkru = svm->vcpu.arch.pkru;
>>  save->xss  = svm->vcpu.arch.ia32_xss;
>>  
>> +/* Update the guest SPEC_CTRL value in the save area */
>> +if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>> +save->spec_ctrl = svm->spec_ctrl;
> 
> I think this can be dropped if svm->spec_ctrl is unused when V_SPEC_CTRL is
> supported (see below).  IIUC, the memcpy() that's just out of sight would do
> the propgation to the VMSA.

Yes, That is right. I will remove this.

> 
>> +
>>  /*
>>   * SEV-ES will use a VMSA that is pointed to by the VMCB, not
>>   * the traditional VMSA that is part of the VMCB. Copy the
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 7ef171790d02..a0cb01a5c8c5 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -1244,6 +1244,9 @@ static void init_vmcb(struct vcpu_svm *svm)
>>  
>>  svm_check_invpcid(svm);
>>  
>> +if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>> +save->spec_ctrl = svm->spec_ctrl;
>> +
>>  if (kvm_vcpu_apicv_active(>vcpu))
>>  avic_init_vmcb(svm);
>>  
>> @@ -3789,7 +3792,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
>> kvm_vcpu *vcpu)
>>   * is no need to worry about the conditional branch over the wrmsr
>>   * being speculatively taken.
>>   */
>> -x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
>> +if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>> +svm->vmcb->save.spec_ctrl = svm->spec_ctrl;
>> +else
>> +x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
> 
> Can't we avoid functional code in svm_vcpu_run() entirely when V_SPEC_CTRL is
> supported?  Make this code a nop, disable interception from time zero, and

Sean, I thought you mentioned earlier about not changing the interception
mechanism. Do you think we should disable the interception right away if
V_SPEC_CTRL is supported?

> read/write the VMBC field in svm_{get,set}_msr().  I.e. don't touch
> svm->spec_ctrl if V_SPEC_CTRL is supported.  
> 
>   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
> 
>   svm_vcpu_enter_exit(vcpu, svm);
> 
>   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL) &&
>   unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
>   svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);

Ok. It appears the above code might work fine with changes in
svm_{get,set}_msr() to update save spec_ctlr. I will retest few
combinations to make sure it works.
Thanks
Babu

> 
>>  svm_vcpu_enter_exit(vcpu, svm);
>>  
>> @@ -3808,13 +3814,18 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
>> kvm_vcpu *vcpu)
>>   * If the L02 MSR bitmap does not intercept the MSR, then we need to
>>   * save it.
>>   */
>> -if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
>> -svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
>> +if (unlikely(!msr_write

[PATCH v3 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-01-15 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. A hypervisor may wish to impose speculation controls on
guest execution or a guest may want to impose its own speculation
controls. Therefore, the processor implements both host and guest
versions of SPEC_CTRL. Presence of this feature is indicated via CPUID
function 0x800A_EDX[20]: GuestSpecCtrl.  Hypervisors are not
required to enable this feature since it is automatically enabled on
processors that support it.

When in host mode, the host SPEC_CTRL value is in effect and writes
update only the host version of SPEC_CTRL. On a VMRUN, the processor
loads the guest version of SPEC_CTRL from the VMCB. When the guest
writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
the guest version is saved into the VMCB and the processor returns
to only using the host SPEC_CTRL for speculation control. The guest
SPEC_CTRL is located at offset 0x2E0 in the VMCB.

The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
ensure a minimum SPEC_CTRL if desired.

This support also fixes an issue where a guest may sometimes see an
inconsistent value for the SPEC_CTRL MSR on processors that support
this feature. With the current SPEC_CTRL support, the first write to
SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
will be 0x0, instead of the actual expected value. There isn’t a
security concern here, because the host SPEC_CTRL value is or’ed with
the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
MSR just before the VMRUN, so it will always have the actual value
even though it doesn’t appear that way in the guest. The guest will
only see the proper value for the SPEC_CTRL register if the guest was
to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
support, the save area spec_ctrl is properly saved and restored.
So, the guest will always see the proper value when it is read back.

Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/svm.h |4 +++-
 arch/x86/kvm/svm/sev.c |4 
 arch/x86/kvm/svm/svm.c |   19 +++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1c561945b426..772e60efe243 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -269,7 +269,9 @@ struct vmcb_save_area {
 * SEV-ES guests when referenced through the GHCB or for
 * saving to the host save area.
 */
-   u8 reserved_7[80];
+   u8 reserved_7[72];
+   u32 spec_ctrl;  /* Guest version of SPEC_CTRL at 0x2E0 */
+   u8 reserved_7b[4];
u32 pkru;
u8 reserved_7a[20];
u64 reserved_8; /* rax already available at 0x01f8 */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c8ffdbc81709..959d6e47bd84 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -546,6 +546,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->pkru = svm->vcpu.arch.pkru;
save->xss  = svm->vcpu.arch.ia32_xss;
 
+   /* Update the guest SPEC_CTRL value in the save area */
+   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   save->spec_ctrl = svm->spec_ctrl;
+
/*
 * SEV-ES will use a VMSA that is pointed to by the VMCB, not
 * the traditional VMSA that is part of the VMCB. Copy the
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7ef171790d02..a0cb01a5c8c5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1244,6 +1244,9 @@ static void init_vmcb(struct vcpu_svm *svm)
 
svm_check_invpcid(svm);
 
+   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   save->spec_ctrl = svm->spec_ctrl;
+
if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
@@ -3789,7 +3792,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
kvm_vcpu *vcpu)
 * is no need to worry about the conditional branch over the wrmsr
 * being speculatively taken.
 */
-   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
+   if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   svm->vmcb->save.spec_ctrl = svm->spec_ctrl;
+   else
+   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
svm_vcpu_enter_exit(vcpu, svm);
 
@@ -3808,13 +3814,18 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
kvm_vcpu *vcpu)
 * If the L02 MSR bitmap does not intercept the MSR, then we need to
 * save it.
 */
-   if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
-   svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL

[PATCH v3 1/2] x86/cpufeatures: Add the Virtual SPEC_CTRL feature

2021-01-15 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
function 0x800A_EDX[20]: GuestSpecCtrl. When present, the
SPEC_CTRL MSR is automatically virtualized.

Signed-off-by: Babu Moger 
Acked-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..3fcd0624b1bc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -337,6 +337,7 @@
 #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
Controller */
 #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_V_SPEC_CTRL(15*32+20) /* Virtual SPEC_CTRL 
*/
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/



[PATCH v3 0/2] x86: Add the feature Virtual SPEC_CTRL

2021-01-15 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR on the guest. The series adds the feature support
and enables the feature on SVM.
---
v3:
  1. Taken care of recent changes in vmcb_save_area. Needed to adjust the save
 area spec_ctrl definition.
  2. Taken care of few comments from Tom.
 a. Initialised the save area spec_ctrl in case of SEV-ES.
 b. Removed the changes in svm_get_msr/svm_set_msr.
 c. Reverted the changes to disable the msr interception to avoid 
compatibility
issue.
  3. Updated the patch #1 with Acked-by from Boris.
  
v2:
  
https://lore.kernel.org/kvm/160867624053.3471.7106539070175910424.stgit@bmoger-ubuntu/
  NOTE: This is not final yet. Sending out the patches to make
  sure I captured all the comments correctly.

  1. Most of the changes are related to Jim and Sean's feedback.
  2. Improved the description of patch #2.
  3. Updated the vmcb save area's guest spec_ctrl value(offset 0x2E0)
 properly. Initialized during init_vmcb and svm_set_msr and
 returned the value from save area for svm_get_msr.
  4. As Jim commented, transferred the value into the VMCB prior
 to VMRUN and out of the VMCB after #VMEXIT.
  5. Added kvm-unit-test to detect the SPEC CTRL feature.
 
https://lore.kernel.org/kvm/160865324865.19910.5159218511905134908.stgit@bmoger-ubuntu/
  6. Sean mantioned of renaming MSR_AMD64_VIRT_SPEC_CTRL. But, it might
 create even more confusion, so dropped the idea for now.

v1:
https://lore.kernel.org/kvm/160738054169.28590.5171339079028237631.stgit@bmoger-ubuntu/

Babu Moger (2):
  x86/cpufeatures: Add the Virtual SPEC_CTRL feature
  KVM: SVM: Add support for Virtual SPEC_CTRL


 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/svm.h |4 +++-
 arch/x86/kvm/svm/sev.c |4 
 arch/x86/kvm/svm/svm.c |   19 +++
 4 files changed, 23 insertions(+), 5 deletions(-)

--


RE: [PATCH v2 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-01-04 Thread Babu Moger



> -Original Message-
> From: Sean Christopherson 
> Sent: Wednesday, December 30, 2020 10:08 AM
> To: Borislav Petkov 
> Cc: Moger, Babu ; pbonz...@redhat.com;
> t...@linutronix.de; mi...@redhat.com; fenghua...@intel.com;
> tony.l...@intel.com; wanpen...@tencent.com; k...@vger.kernel.org;
> Lendacky, Thomas ; pet...@infradead.org;
> j...@8bytes.org; x...@kernel.org; kyung.min.p...@intel.com; linux-
> ker...@vger.kernel.org; krish.sadhuk...@oracle.com; h...@zytor.com;
> mgr...@linux.intel.com; vkuzn...@redhat.com; Phillips, Kim
> ; Huang2, Wei ;
> jmatt...@google.com
> Subject: Re: [PATCH v2 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL
> 
> On Wed, Dec 30, 2020, Borislav Petkov wrote:
> > On Tue, Dec 22, 2020 at 04:31:55PM -0600, Babu Moger wrote:
> > > @@ -2549,7 +2559,10 @@ static int svm_get_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
> > >   !guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD))
> > >   return 1;
> > >
> > > - msr_info->data = svm->spec_ctrl;
> > > + if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> > > + msr_info->data = svm->vmcb->save.spec_ctrl;
> > > + else
> > > + msr_info->data = svm->spec_ctrl;
> > >   break;
> > >   case MSR_AMD64_VIRT_SPEC_CTRL:
> > >   if (!msr_info->host_initiated &&
> > > @@ -2640,6 +2653,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr)
> > >   return 1;
> > >
> > >   svm->spec_ctrl = data;
> > > + if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> > > + svm->vmcb->save.spec_ctrl = data;
> > >   if (!data)
> > >   break;
> > >
> >
> > Are the get/set_msr() accessors such a fast path that they need
> > static_cpu_has() ?
> 
> Nope, they can definitely use boot_cpu_has().

With Tom's latest comment, this change may not be required.
I will remove these changes.
Thanks
Babu


RE: [PATCH v2 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-01-04 Thread Babu Moger



> -Original Message-
> From: Lendacky, Thomas 
> Sent: Monday, January 4, 2021 9:47 AM
> To: Moger, Babu ; pbonz...@redhat.com;
> t...@linutronix.de; mi...@redhat.com; b...@alien8.de
> Cc: fenghua...@intel.com; tony.l...@intel.com; wanpen...@tencent.com;
> k...@vger.kernel.org; pet...@infradead.org; sea...@google.com;
> j...@8bytes.org; x...@kernel.org; kyung.min.p...@intel.com; linux-
> ker...@vger.kernel.org; krish.sadhuk...@oracle.com; h...@zytor.com;
> mgr...@linux.intel.com; vkuzn...@redhat.com; Phillips, Kim
> ; Huang2, Wei ;
> jmatt...@google.com
> Subject: Re: [PATCH v2 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL
> 
> On 12/22/20 4:31 PM, Babu Moger wrote:
> > Newer AMD processors have a feature to virtualize the use of the
> > SPEC_CTRL MSR. A hypervisor may wish to impose speculation controls on
> > guest execution or a guest may want to impose its own speculation
> > controls. Therefore, the processor implements both host and guest
> > versions of SPEC_CTRL. Presence of this feature is indicated via CPUID
> > function 0x800A_EDX[20]: GuestSpecCtrl.  Hypervisors are not
> > required to enable this feature since it is automatically enabled on
> > processors that support it.
> >
> > When in host mode, the host SPEC_CTRL value is in effect and writes
> > update only the host version of SPEC_CTRL. On a VMRUN, the processor
> > loads the guest version of SPEC_CTRL from the VMCB. When the guest
> > writes SPEC_CTRL, only the guest version is updated. On a VMEXIT, the
> > guest version is saved into the VMCB and the processor returns to only
> > using the host SPEC_CTRL for speculation control. The guest SPEC_CTRL
> > is located at offset 0x2E0 in the VMCB.
> 
> With the SEV-ES hypervisor support now in the tree, this will need to add 
> support
> in sev_es_sync_vmsa() to put the initial svm->spec_ctrl value in the SEV-ES
> VMSA.
> 
> >
> > The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
> > with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
> > ensure a minimum SPEC_CTRL if desired.
> >
> > This support also fixes an issue where a guest may sometimes see an
> > inconsistent value for the SPEC_CTRL MSR on processors that support
> > this feature. With the current SPEC_CTRL support, the first write to
> > SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
> > MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
> > will be 0x0, instead of the actual expected value. There isn’t a
> > security concern here, because the host SPEC_CTRL value is or’ed with
> > the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
> > KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
> > MSR just before the VMRUN, so it will always have the actual value
> > even though it doesn’t appear that way in the guest. The guest will
> > only see the proper value for the SPEC_CTRL register if the guest was
> > to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
> > support, the MSR interception of SPEC_CTRL is disabled during
> > vmcb_init, so this will no longer be an issue.
> >
> > Signed-off-by: Babu Moger 
> > ---
> >   arch/x86/include/asm/svm.h |4 +++-
> >   arch/x86/kvm/svm/svm.c |   29 +
> >   2 files changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > index 71d630bb5e08..753b25db427c 100644
> > --- a/arch/x86/include/asm/svm.h
> > +++ b/arch/x86/include/asm/svm.h
> > @@ -248,12 +248,14 @@ struct vmcb_save_area {
> > u64 br_to;
> > u64 last_excp_from;
> > u64 last_excp_to;
> > +   u8 reserved_12[72];
> > +   u32 spec_ctrl;  /* Guest version of SPEC_CTRL at 0x2E0 */
> >
> > /*
> >  * The following part of the save area is valid only for
> >  * SEV-ES guests when referenced through the GHCB.
> >  */
> > -   u8 reserved_7[104];
> > +   u8 reserved_7[28];
> > u64 reserved_8; /* rax already available at 0x01f8 */
> > u64 rcx;
> > u64 rdx;
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index
> > 79b3a564f1c9..6d3db3e8cdfe 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1230,6 +1230,16 @@ static void init_vmcb(struct vcpu_svm *svm)
> >
> > svm_check_invpcid(svm);
> >
> > +   /*
> > +* If the host supports V_SPEC_CTRL then disable the interception
> > +* of MSR_IA32_SPEC_

[PATCH v2 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2020-12-22 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. A hypervisor may wish to impose speculation controls on
guest execution or a guest may want to impose its own speculation
controls. Therefore, the processor implements both host and guest
versions of SPEC_CTRL. Presence of this feature is indicated via CPUID
function 0x800A_EDX[20]: GuestSpecCtrl.  Hypervisors are not
required to enable this feature since it is automatically enabled on
processors that support it.

When in host mode, the host SPEC_CTRL value is in effect and writes
update only the host version of SPEC_CTRL. On a VMRUN, the processor
loads the guest version of SPEC_CTRL from the VMCB. When the guest
writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
the guest version is saved into the VMCB and the processor returns
to only using the host SPEC_CTRL for speculation control. The guest
SPEC_CTRL is located at offset 0x2E0 in the VMCB.

The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
ensure a minimum SPEC_CTRL if desired.

This support also fixes an issue where a guest may sometimes see an
inconsistent value for the SPEC_CTRL MSR on processors that support
this feature. With the current SPEC_CTRL support, the first write to
SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
will be 0x0, instead of the actual expected value. There isn’t a
security concern here, because the host SPEC_CTRL value is or’ed with
the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
MSR just before the VMRUN, so it will always have the actual value
even though it doesn’t appear that way in the guest. The guest will
only see the proper value for the SPEC_CTRL register if the guest was
to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
support, the MSR interception of SPEC_CTRL is disabled during
vmcb_init, so this will no longer be an issue.

Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/svm.h |4 +++-
 arch/x86/kvm/svm/svm.c |   29 +
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 71d630bb5e08..753b25db427c 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -248,12 +248,14 @@ struct vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
+   u8 reserved_12[72];
+   u32 spec_ctrl;  /* Guest version of SPEC_CTRL at 0x2E0 */
 
/*
 * The following part of the save area is valid only for
 * SEV-ES guests when referenced through the GHCB.
 */
-   u8 reserved_7[104];
+   u8 reserved_7[28];
u64 reserved_8; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 79b3a564f1c9..6d3db3e8cdfe 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1230,6 +1230,16 @@ static void init_vmcb(struct vcpu_svm *svm)
 
svm_check_invpcid(svm);
 
+   /*
+* If the host supports V_SPEC_CTRL then disable the interception
+* of MSR_IA32_SPEC_CTRL.
+*/
+   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)) {
+   save->spec_ctrl = svm->spec_ctrl;
+   set_msr_interception(>vcpu, svm->msrpm,
+MSR_IA32_SPEC_CTRL, 1, 1);
+   }
+
if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
@@ -2549,7 +2559,10 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
!guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD))
return 1;
 
-   msr_info->data = svm->spec_ctrl;
+   if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   msr_info->data = svm->vmcb->save.spec_ctrl;
+   else
+   msr_info->data = svm->spec_ctrl;
break;
case MSR_AMD64_VIRT_SPEC_CTRL:
if (!msr_info->host_initiated &&
@@ -2640,6 +2653,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
return 1;
 
svm->spec_ctrl = data;
+   if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   svm->vmcb->save.spec_ctrl = data;
if (!data)
break;
 
@@ -3590,7 +3605,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
kvm_vcpu *vcpu)
 * is no need to worry about the conditional branch over the wrmsr
 * being speculatively taken.
 */
-   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_

[PATCH v2 1/2] x86/cpufeatures: Add the Virtual SPEC_CTRL feature

2020-12-22 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
function 0x800A_EDX[20]: GuestSpecCtrl. When preset, the SPEC_CTRL
MSR is automatically virtualized.

Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..aee4a924ecd7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -335,6 +335,7 @@
 #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
Controller */
 #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_V_SPEC_CTRL(15*32+20) /* Virtual SPEC_CTRL */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/



[PATCH v2 0/2] x86: Add the feature Virtual SPEC_CTRL

2020-12-22 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR on the guest. The series adds the feature support
and enables the feature on SVM.
---
v2:
  NOTE: This is not final yet. Sending out the patches to make
  sure I captured all the comments correctly.

  1. Most of the changes are related to Jim and Sean's feedback.
  2. Improved the description of patch #2.
  3. Updated the vmcb save area's guest spec_ctrl value(offset 0x2E0)
 properly. Initialized during init_vmcb and svm_set_msr and
 returned the value from save area for svm_get_msr.
  4. As Jim commented, transferred the value into the VMCB prior
 to VMRUN and out of the VMCB after #VMEXIT.
  5. Added kvm-unit-test to detect the SPEC CTRL feature.
 
https://lore.kernel.org/kvm/160865324865.19910.5159218511905134908.stgit@bmoger-ubuntu/
  6. Sean mantioned of renaming MSR_AMD64_VIRT_SPEC_CTRL. But, it might
 create even more confusion, so dropped the idea for now.

v1:
https://lore.kernel.org/kvm/160738054169.28590.5171339079028237631.stgit@bmoger-ubuntu/

---

Babu Moger (2):
  x86/cpufeatures: Add the Virtual SPEC_CTRL feature
  KVM: SVM: Add support for Virtual SPEC_CTRL


 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/svm.h |4 +++-
 arch/x86/kvm/svm/svm.c |   29 +
 3 files changed, 29 insertions(+), 5 deletions(-)

--


Re: [PATCH 1/2] x86/cpufeatures: Add the Virtual SPEC_CTRL feature

2020-12-22 Thread Babu Moger



On 12/22/20 11:41 AM, Sean Christopherson wrote:
> On Tue, Dec 22, 2020, Babu Moger wrote:
>>
>> On 12/9/20 5:11 PM, Jim Mattson wrote:
>>> On Wed, Dec 9, 2020 at 2:39 PM Babu Moger  wrote:
>>>>
>>>> On 12/7/20 5:22 PM, Jim Mattson wrote:
>>>>> On Mon, Dec 7, 2020 at 2:38 PM Babu Moger  wrote:
>>>>>> diff --git a/arch/x86/include/asm/cpufeatures.h 
>>>>>> b/arch/x86/include/asm/cpufeatures.h
>>>>>> index dad350d42ecf..d649ac5ed7c7 100644
>>>>>> --- a/arch/x86/include/asm/cpufeatures.h
>>>>>> +++ b/arch/x86/include/asm/cpufeatures.h
>>>>>> @@ -335,6 +335,7 @@
>>>>>>  #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
>>>>>> Controller */
>>>>>>  #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE 
>>>>>> VMLOAD */
>>>>>>  #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
>>>>>> +#define X86_FEATURE_V_SPEC_CTRL(15*32+20) /* Virtual 
>>>>>> SPEC_CTRL */
>>>>>
>>>>> Shouldn't this bit be reported by KVM_GET_SUPPORTED_CPUID when it's
>>>>> enumerated on the host?
>>>>
>>>> Jim, I am not sure if this needs to be reported by
>>>> KVM_GET_SUPPORTED_CPUID. I dont see V_VMSAVE_VMLOAD or VGIF being reported
>>>> via KVM_GET_SUPPORTED_CPUID. Do you see the need for that?
>>>
>>> Every little bit helps. No, it isn't *needed*. But then again, this
>>> entire patchset isn't *needed*, is it?
>>>
>>
>> Working on v2 of these patches. Saw this code comment(in
>> arch/x86/kvm/cpuid.c) on about exposing SVM features to the guest.
>>
>>
>> /*
>>  * Hide all SVM features by default, SVM will set the cap bits for
>>  * features it emulates and/or exposes for L1.
>>  */
>> kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);
>>
>>
>> Should we go ahead with the changes here?
> 
> Probably not, as the current SVM implementation aligns with the intended use 
> of
> KVM_GET_SUPPORTED_CPUID.  The current approach is to enumerate what SVM 
> features
> KVM can virtualize or emulate for a nested VM, i.e. what SVM features an L1 
> VMM
> can use and thus can be set in a vCPU's CPUID model.  For V_SPEC_CTRL, I'm
> pretty sure Jim was providing feedback for the non-nested case of reporting
> host/KVM support of the feature itself.
> 
> There is the question of whether or not KVM should have an ioctl() to report
> what virtualization features are supported/enabled.  AFAIK, it's not truly
> required as userspace can glean the information via /proc/cpuinfo (especially
> now that vmx_features exists), raw CPUID, and KVM module params.  Providing an
> ioctl() would likely be a bit cleaner for userspace, but I'm guessing that 
> ship
> has already sailed for most VMMs.
> 

Sean, Thanks for the clarifications.


Re: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2020-12-22 Thread Babu Moger



On 12/7/20 5:06 PM, Jim Mattson wrote:
> On Mon, Dec 7, 2020 at 2:38 PM Babu Moger  wrote:
>>
>> Newer AMD processors have a feature to virtualize the use of the
>> SPEC_CTRL MSR. When supported, the SPEC_CTRL MSR is automatically
>> virtualized and no longer requires hypervisor intervention.
>>
>> This feature is detected via CPUID function 0x800A_EDX[20]:
>> GuestSpecCtrl.
>>
>> Hypervisors are not required to enable this feature since it is
>> automatically enabled on processors that support it.
>>
>> When this feature is enabled, the hypervisor no longer has to
>> intercept the usage of the SPEC_CTRL MSR and no longer is required to
>> save and restore the guest SPEC_CTRL setting when switching
>> hypervisor/guest modes.  The effective SPEC_CTRL setting is the guest
>> SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This
>> allows the hypervisor to ensure a minimum SPEC_CTRL if desired.
>>
>> This support also fixes an issue where a guest may sometimes see an
>> inconsistent value for the SPEC_CTRL MSR on processors that support
>> this feature. With the current SPEC_CTRL support, the first write to
>> SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
>> MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
>> will be 0x0, instead of the actual expected value. There isn’t a
>> security concern here, because the host SPEC_CTRL value is or’ed with
>> the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
>> KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
>> MSR just before the VMRUN, so it will always have the actual value
>> even though it doesn’t appear that way in the guest. The guest will
>> only see the proper value for the SPEC_CTRL register if the guest was
>> to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
>> support, the MSR interception of SPEC_CTRL is disabled during
>> vmcb_init, so this will no longer be an issue.
>>
>> Signed-off-by: Babu Moger 
>> ---
> 
> Shouldn't there be some code to initialize a new "guest SPEC_CTRL"
> value in the VMCB, both at vCPU creation, and at virtual processor
> reset?
> 
>>  arch/x86/kvm/svm/svm.c |   17 ++---
>>  1 file changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 79b3a564f1c9..3d73ec0cdb87 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -1230,6 +1230,14 @@ static void init_vmcb(struct vcpu_svm *svm)
>>
>> svm_check_invpcid(svm);
>>
>> +   /*
>> +* If the host supports V_SPEC_CTRL then disable the interception
>> +* of MSR_IA32_SPEC_CTRL.
>> +*/
>> +   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>> +   set_msr_interception(>vcpu, svm->msrpm, 
>> MSR_IA32_SPEC_CTRL,
>> +1, 1);
>> +
>> if (kvm_vcpu_apicv_active(>vcpu))
>> avic_init_vmcb(svm);
>>
>> @@ -3590,7 +3598,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
>> kvm_vcpu *vcpu)
>>  * is no need to worry about the conditional branch over the wrmsr
>>  * being speculatively taken.
>>  */
>> -   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
>> +   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>> +   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
> 
> Is this correct for the nested case? Presumably, there is now a "guest
> SPEC_CTRL" value somewhere in the VMCB. If L1 does not intercept this
> MSR, then we need to transfer the "guest SPEC_CTRL" value from the
> vmcb01 to the vmcb02, don't we?
> 
>> svm_vcpu_enter_exit(vcpu, svm);
>>
>> @@ -3609,12 +3618,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
>> kvm_vcpu *vcpu)
>>  * If the L02 MSR bitmap does not intercept the MSR, then we need to
>>  * save it.
>>  */
>> -   if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
>> +   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL) &&
>> +   unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
>> svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
> 
> Is this correct for the nested case? If L1 does not intercept this
> MSR, then it might have changed while L2 is running. Presumably, the
> hardware has stored the new value somewhere in the vmcb02 at #VM

Re: [PATCH 1/2] x86/cpufeatures: Add the Virtual SPEC_CTRL feature

2020-12-22 Thread Babu Moger



On 12/9/20 5:11 PM, Jim Mattson wrote:
> On Wed, Dec 9, 2020 at 2:39 PM Babu Moger  wrote:
>>
>>
>>
>> On 12/7/20 5:22 PM, Jim Mattson wrote:
>>> On Mon, Dec 7, 2020 at 2:38 PM Babu Moger  wrote:
>>>>
>>>> Newer AMD processors have a feature to virtualize the use of the SPEC_CTRL
>>>> MSR. This feature is identified via CPUID 0x800A_EDX[20]. When present,
>>>> the SPEC_CTRL MSR is automatically virtualized and no longer requires
>>>> hypervisor intervention.
>>>>
>>>> Signed-off-by: Babu Moger 
>>>> ---
>>>>  arch/x86/include/asm/cpufeatures.h |1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/cpufeatures.h 
>>>> b/arch/x86/include/asm/cpufeatures.h
>>>> index dad350d42ecf..d649ac5ed7c7 100644
>>>> --- a/arch/x86/include/asm/cpufeatures.h
>>>> +++ b/arch/x86/include/asm/cpufeatures.h
>>>> @@ -335,6 +335,7 @@
>>>>  #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
>>>> Controller */
>>>>  #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE 
>>>> VMLOAD */
>>>>  #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
>>>> +#define X86_FEATURE_V_SPEC_CTRL(15*32+20) /* Virtual 
>>>> SPEC_CTRL */
>>>
>>> Shouldn't this bit be reported by KVM_GET_SUPPORTED_CPUID when it's
>>> enumerated on the host?
>>
>> Jim, I am not sure if this needs to be reported by
>> KVM_GET_SUPPORTED_CPUID. I dont see V_VMSAVE_VMLOAD or VGIF being reported
>> via KVM_GET_SUPPORTED_CPUID. Do you see the need for that?
> 
> Every little bit helps. No, it isn't *needed*. But then again, this
> entire patchset isn't *needed*, is it?
> 

Working on v2 of these patches. Saw this code comment(in
arch/x86/kvm/cpuid.c) on about exposing SVM features to the guest.


/*
 * Hide all SVM features by default, SVM will set the cap bits for
 * features it emulates and/or exposes for L1.
 */
kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);


Should we go ahead with the changes here?

Thanks
Babu


Re: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2020-12-10 Thread Babu Moger



On 12/10/20 3:36 PM, Jim Mattson wrote:
> On Thu, Dec 10, 2020 at 1:26 PM Babu Moger  wrote:
>>
>> Hi Jim,
>>
>>> -Original Message-
>>> From: Jim Mattson 
>>> Sent: Monday, December 7, 2020 5:06 PM
>>> To: Moger, Babu 
>>> Cc: Paolo Bonzini ; Thomas Gleixner
>>> ; Ingo Molnar ; Borislav Petkov
>>> ; Yu, Fenghua ; Tony Luck
>>> ; Wanpeng Li ; kvm list
>>> ; Lendacky, Thomas ;
>>> Peter Zijlstra ; Sean Christopherson
>>> ; Joerg Roedel ; the arch/x86
>>> maintainers ; kyung.min.p...@intel.com; LKML >> ker...@vger.kernel.org>; Krish Sadhukhan ; H .
>>> Peter Anvin ; mgr...@linux.intel.com; Vitaly Kuznetsov
>>> ; Phillips, Kim ; Huang2, Wei
>>> 
>>> Subject: Re: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL
>>>
>>> On Mon, Dec 7, 2020 at 2:38 PM Babu Moger  wrote:
>>>>
>>>> Newer AMD processors have a feature to virtualize the use of the
>>>> SPEC_CTRL MSR. When supported, the SPEC_CTRL MSR is automatically
>>>> virtualized and no longer requires hypervisor intervention.
>>>>
>>>> This feature is detected via CPUID function 0x800A_EDX[20]:
>>>> GuestSpecCtrl.
>>>>
>>>> Hypervisors are not required to enable this feature since it is
>>>> automatically enabled on processors that support it.
>>>>
>>>> When this feature is enabled, the hypervisor no longer has to
>>>> intercept the usage of the SPEC_CTRL MSR and no longer is required to
>>>> save and restore the guest SPEC_CTRL setting when switching
>>>> hypervisor/guest modes.  The effective SPEC_CTRL setting is the guest
>>>> SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This
>>>> allows the hypervisor to ensure a minimum SPEC_CTRL if desired.
>>>>
>>>> This support also fixes an issue where a guest may sometimes see an
>>>> inconsistent value for the SPEC_CTRL MSR on processors that support
>>>> this feature. With the current SPEC_CTRL support, the first write to
>>>> SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
>>>> MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
>>>> will be 0x0, instead of the actual expected value. There isn’t a
>>>> security concern here, because the host SPEC_CTRL value is or’ed with
>>>> the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
>>>> KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
>>>> MSR just before the VMRUN, so it will always have the actual value
>>>> even though it doesn’t appear that way in the guest. The guest will
>>>> only see the proper value for the SPEC_CTRL register if the guest was
>>>> to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
>>>> support, the MSR interception of SPEC_CTRL is disabled during
>>>> vmcb_init, so this will no longer be an issue.
>>>>
>>>> Signed-off-by: Babu Moger 
>>>> ---
>>>
>>> Shouldn't there be some code to initialize a new "guest SPEC_CTRL"
>>> value in the VMCB, both at vCPU creation, and at virtual processor reset?
>>
>> Yes, I think so. I will check on this.
>>
>>>
>>>>  arch/x86/kvm/svm/svm.c |   17 ++---
>>>>  1 file changed, 14 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index
>>>> 79b3a564f1c9..3d73ec0cdb87 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -1230,6 +1230,14 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>>
>>>> svm_check_invpcid(svm);
>>>>
>>>> +   /*
>>>> +* If the host supports V_SPEC_CTRL then disable the interception
>>>> +* of MSR_IA32_SPEC_CTRL.
>>>> +*/
>>>> +   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>>>> +   set_msr_interception(>vcpu, svm->msrpm,
>>> MSR_IA32_SPEC_CTRL,
>>>> +1, 1);
>>>> +
>>>> if (kvm_vcpu_apicv_active(>vcpu))
>>>> avic_init_vmcb(svm);
>>>>
>>>> @@ -3590,7 +3598,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct
>>> kvm_vcpu *vcpu)
>>>>  * is no need to worry about the conditi

RE: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2020-12-10 Thread Babu Moger
Hi Jim,

> -Original Message-
> From: Jim Mattson 
> Sent: Monday, December 7, 2020 5:06 PM
> To: Moger, Babu 
> Cc: Paolo Bonzini ; Thomas Gleixner
> ; Ingo Molnar ; Borislav Petkov
> ; Yu, Fenghua ; Tony Luck
> ; Wanpeng Li ; kvm list
> ; Lendacky, Thomas ;
> Peter Zijlstra ; Sean Christopherson
> ; Joerg Roedel ; the arch/x86
> maintainers ; kyung.min.p...@intel.com; LKML  ker...@vger.kernel.org>; Krish Sadhukhan ; H .
> Peter Anvin ; mgr...@linux.intel.com; Vitaly Kuznetsov
> ; Phillips, Kim ; Huang2, Wei
> 
> Subject: Re: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL
> 
> On Mon, Dec 7, 2020 at 2:38 PM Babu Moger  wrote:
> >
> > Newer AMD processors have a feature to virtualize the use of the
> > SPEC_CTRL MSR. When supported, the SPEC_CTRL MSR is automatically
> > virtualized and no longer requires hypervisor intervention.
> >
> > This feature is detected via CPUID function 0x800A_EDX[20]:
> > GuestSpecCtrl.
> >
> > Hypervisors are not required to enable this feature since it is
> > automatically enabled on processors that support it.
> >
> > When this feature is enabled, the hypervisor no longer has to
> > intercept the usage of the SPEC_CTRL MSR and no longer is required to
> > save and restore the guest SPEC_CTRL setting when switching
> > hypervisor/guest modes.  The effective SPEC_CTRL setting is the guest
> > SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This
> > allows the hypervisor to ensure a minimum SPEC_CTRL if desired.
> >
> > This support also fixes an issue where a guest may sometimes see an
> > inconsistent value for the SPEC_CTRL MSR on processors that support
> > this feature. With the current SPEC_CTRL support, the first write to
> > SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
> > MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
> > will be 0x0, instead of the actual expected value. There isn’t a
> > security concern here, because the host SPEC_CTRL value is or’ed with
> > the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
> > KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
> > MSR just before the VMRUN, so it will always have the actual value
> > even though it doesn’t appear that way in the guest. The guest will
> > only see the proper value for the SPEC_CTRL register if the guest was
> > to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
> > support, the MSR interception of SPEC_CTRL is disabled during
> > vmcb_init, so this will no longer be an issue.
> >
> > Signed-off-by: Babu Moger 
> > ---
> 
> Shouldn't there be some code to initialize a new "guest SPEC_CTRL"
> value in the VMCB, both at vCPU creation, and at virtual processor reset?

Yes, I think so. I will check on this.

> 
> >  arch/x86/kvm/svm/svm.c |   17 ++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index
> > 79b3a564f1c9..3d73ec0cdb87 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1230,6 +1230,14 @@ static void init_vmcb(struct vcpu_svm *svm)
> >
> > svm_check_invpcid(svm);
> >
> > +   /*
> > +* If the host supports V_SPEC_CTRL then disable the interception
> > +* of MSR_IA32_SPEC_CTRL.
> > +*/
> > +   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> > +   set_msr_interception(>vcpu, svm->msrpm,
> MSR_IA32_SPEC_CTRL,
> > +1, 1);
> > +
> > if (kvm_vcpu_apicv_active(>vcpu))
> > avic_init_vmcb(svm);
> >
> > @@ -3590,7 +3598,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct
> kvm_vcpu *vcpu)
> >  * is no need to worry about the conditional branch over the wrmsr
> >  * being speculatively taken.
> >  */
> > -   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
> > +   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> > +   x86_spec_ctrl_set_guest(svm->spec_ctrl,
> > + svm->virt_spec_ctrl);
> 
> Is this correct for the nested case? Presumably, there is now a "guest
> SPEC_CTRL" value somewhere in the VMCB. If L1 does not intercept this MSR,
> then we need to transfer the "guest SPEC_CTRL" value from the
> vmcb01 to the vmcb02, don't we?

Here is the text from to be published documentation.
"When in host mode, the host SPEC_CTRL value is in effect and writes
update

Re: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2020-12-10 Thread Babu Moger
Sean, Your response did not land in my mailbox for some reason.
 Replying using In-reply-to option.

>Hrm, is MSR_AMD64_VIRT_SPEC_CTRL only for SSBD?  Should that MSR be renamed to
>avoid confusion with the new form of VIRT_SPEC_CTRL?

We can rename it to MSR_AMD64_VIRT_SSBD_SPEC_CTRL if that is any better.

>Well, it's still required if the hypervisor wanted to allow the guest to turn
>off mitigations that are enabled in the host.  I'd omit this entirely and focus
>on what hardware does and how Linux/KVM utilize the new feature.

Ok. Sure.

>This line needs to be higher in the changelog, it's easily the most relevant
>info for understanding the mechanics.  Please also explicitly state the context
>switching mechanics, e.g. is it tracked in the VMCB, loaded on VMRUN, saved on
>VM-Exit, etc...

Will add more details.

>This will break migration, or maybe just cause wierdness, as userspace will
>always see '0' when reading SPEC_CTRL and its writes will be ignored.  Is there
>a VMCB field that holds the guest's value?  If so, this read can be skipped, 
>and
>instead the MSR set/get flows probably need to poke into the VMCB.

Yes. The guest SEPC_CTRL value is saved in VMCB save area(i.e. 0x400 + 0x2E0).
Yes, will look into setting VMCB with the desired values in msr set/get if 
that helps.
Thanks
Babu


Re: [PATCH 1/2] x86/cpufeatures: Add the Virtual SPEC_CTRL feature

2020-12-09 Thread Babu Moger



On 12/7/20 5:22 PM, Jim Mattson wrote:
> On Mon, Dec 7, 2020 at 2:38 PM Babu Moger  wrote:
>>
>> Newer AMD processors have a feature to virtualize the use of the SPEC_CTRL
>> MSR. This feature is identified via CPUID 0x800A_EDX[20]. When present,
>> the SPEC_CTRL MSR is automatically virtualized and no longer requires
>> hypervisor intervention.
>>
>> Signed-off-by: Babu Moger 
>> ---
>>  arch/x86/include/asm/cpufeatures.h |1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h 
>> b/arch/x86/include/asm/cpufeatures.h
>> index dad350d42ecf..d649ac5ed7c7 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -335,6 +335,7 @@
>>  #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
>> Controller */
>>  #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD 
>> */
>>  #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
>> +#define X86_FEATURE_V_SPEC_CTRL(15*32+20) /* Virtual 
>> SPEC_CTRL */
> 
> Shouldn't this bit be reported by KVM_GET_SUPPORTED_CPUID when it's
> enumerated on the host?

Jim, I am not sure if this needs to be reported by
KVM_GET_SUPPORTED_CPUID. I dont see V_VMSAVE_VMLOAD or VGIF being reported
via KVM_GET_SUPPORTED_CPUID. Do you see the need for that?


>>  /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
>>  #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
>> Manipulation instructions*/
>>


[PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2020-12-07 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. When supported, the SPEC_CTRL MSR is automatically
virtualized and no longer requires hypervisor intervention.

This feature is detected via CPUID function 0x800A_EDX[20]:
GuestSpecCtrl.

Hypervisors are not required to enable this feature since it is
automatically enabled on processors that support it.

When this feature is enabled, the hypervisor no longer has to
intercept the usage of the SPEC_CTRL MSR and no longer is required to
save and restore the guest SPEC_CTRL setting when switching
hypervisor/guest modes.  The effective SPEC_CTRL setting is the guest
SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This
allows the hypervisor to ensure a minimum SPEC_CTRL if desired.

This support also fixes an issue where a guest may sometimes see an
inconsistent value for the SPEC_CTRL MSR on processors that support
this feature. With the current SPEC_CTRL support, the first write to
SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
will be 0x0, instead of the actual expected value. There isn’t a
security concern here, because the host SPEC_CTRL value is or’ed with
the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
MSR just before the VMRUN, so it will always have the actual value
even though it doesn’t appear that way in the guest. The guest will
only see the proper value for the SPEC_CTRL register if the guest was
to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
support, the MSR interception of SPEC_CTRL is disabled during
vmcb_init, so this will no longer be an issue.

Signed-off-by: Babu Moger 
---
 arch/x86/kvm/svm/svm.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 79b3a564f1c9..3d73ec0cdb87 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1230,6 +1230,14 @@ static void init_vmcb(struct vcpu_svm *svm)
 
svm_check_invpcid(svm);
 
+   /*
+* If the host supports V_SPEC_CTRL then disable the interception
+* of MSR_IA32_SPEC_CTRL.
+*/
+   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   set_msr_interception(>vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL,
+1, 1);
+
if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
@@ -3590,7 +3598,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu 
*vcpu)
 * is no need to worry about the conditional branch over the wrmsr
 * being speculatively taken.
 */
-   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
+   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
svm_vcpu_enter_exit(vcpu, svm);
 
@@ -3609,12 +3618,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
kvm_vcpu *vcpu)
 * If the L02 MSR bitmap does not intercept the MSR, then we need to
 * save it.
 */
-   if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
+   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL) &&
+   unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
reload_tss(vcpu);
 
-   x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
+   if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
 
vcpu->arch.cr2 = svm->vmcb->save.cr2;
vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;



[PATCH 1/2] x86/cpufeatures: Add the Virtual SPEC_CTRL feature

2020-12-07 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of the SPEC_CTRL
MSR. This feature is identified via CPUID 0x800A_EDX[20]. When present,
the SPEC_CTRL MSR is automatically virtualized and no longer requires
hypervisor intervention.

Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..d649ac5ed7c7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -335,6 +335,7 @@
 #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
Controller */
 #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_V_SPEC_CTRL(15*32+20) /* Virtual SPEC_CTRL 
*/
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/



[PATCH 0/2] x86: Add the feature Virtual SPEC_CTRL

2020-12-07 Thread Babu Moger
Newer AMD processors have a feature to virtualize the use of
the SPEC_CTRL MSR. The series adds the feature support and
enables the feature on SVM.
---

Babu Moger (2):
  x86/cpufeatures: Add the Virtual SPEC_CTRL feature
  KVM: SVM: Add support for Virtual SPEC_CTRL


 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/kvm/svm/svm.c |   17 ++---
 2 files changed, 15 insertions(+), 3 deletions(-)

--


[tip: x86/urgent] x86/resctrl: Fix AMD L3 QOS CDP enable/disable

2020-12-01 Thread tip-bot2 for Babu Moger
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: fae3a13d2a3d49a89391889808428cf1e72afbd7
Gitweb:
https://git.kernel.org/tip/fae3a13d2a3d49a89391889808428cf1e72afbd7
Author:Babu Moger 
AuthorDate:Mon, 30 Nov 2020 09:57:20 -06:00
Committer: Borislav Petkov 
CommitterDate: Tue, 01 Dec 2020 17:53:31 +01:00

x86/resctrl: Fix AMD L3 QOS CDP enable/disable

When the AMD QoS feature CDP (code and data prioritization) is enabled
or disabled, the CDP bit in MSR _0C81 is written on one of the CPUs
in an L3 domain (core complex). That is not correct - the CDP bit needs
to be updated on all the logical CPUs in the domain.

This was not spelled out clearly in the spec earlier. The specification
has been updated and the updated document, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.

Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache
data structure.

The documentation can be obtained at:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

 [ bp: Massage commit message. ]

Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: 
https://lkml.kernel.org/r/160675180380.15628.3309402017215002347.stgit@bmoger-ubuntu
---
 arch/x86/kernel/cpu/resctrl/core.c |  4 
 arch/x86/kernel/cpu/resctrl/internal.h |  3 +++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  9 +++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index e5f4ee8..e8b5f1c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -570,6 +570,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
if (d) {
cpumask_set_cpu(cpu, >cpu_mask);
+   if (r->cache.arch_has_per_cpu_cfg)
+   rdt_domain_reconfigure_cdp(r);
return;
}
 
@@ -923,6 +925,7 @@ static __init void rdt_init_res_defs_intel(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = false;
r->cache.arch_has_empty_bitmaps = false;
+   r->cache.arch_has_per_cpu_cfg = false;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_THRTL_BASE;
r->msr_update = mba_wrmsr_intel;
@@ -943,6 +946,7 @@ static __init void rdt_init_res_defs_amd(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = true;
r->cache.arch_has_empty_bitmaps = true;
+   r->cache.arch_has_per_cpu_cfg = true;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_BW_BASE;
r->msr_update = mba_wrmsr_amd;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 80fa997..f65d3c0 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -360,6 +360,8 @@ struct msr_param {
  * executing entities
  * @arch_has_sparse_bitmaps:   True if a bitmap like f00f is valid.
  * @arch_has_empty_bitmaps:True if the '0' bitmap is valid.
+ * @arch_has_per_cpu_cfg:  True if QOS_CFG register for this cache
+ * level has CPU scope.
  */
 struct rdt_cache {
unsigned intcbm_len;
@@ -369,6 +371,7 @@ struct rdt_cache {
unsigned intshareable_bits;
boolarch_has_sparse_bitmaps;
boolarch_has_empty_bitmaps;
+   boolarch_has_per_cpu_cfg;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6f4ca4b..f341842 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1909,8 +1909,13 @@ static int set_cache_qos_cfg(int level, bool enable)
 
r_l = _resources_all[level];
list_for_each_entry(d, _l->domains, list) {
-   /* Pick one CPU from each domain instance to update MSR */
-   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
+   if (r_l->cache.arch_has_per_cpu_cfg)
+   /* Pick all the CPUs in the domain instance */
+   for_each_cpu(cpu, >cpu_mask)
+   cpumask_set_cpu(cpu, cpu_mask);
+   else
+   /* Pick one CPU from each domain instance to update MSR 
*/
+   

[PATCH v3] x86/resctrl: Fix AMD L3 QOS CDP enable/disable

2020-11-30 Thread Babu Moger
When the AMD QoS feature CDP(code and data prioritization) is enabled
or disabled, the CDP bit in MSR _0C81 is written on one of the
CPUs in L3 domain(core complex). That is not correct. The CDP bit needs
to be updated all the logical CPUs in the domain.

This was not spelled out clearly in the spec earlier. The specification
has been updated. The updated specification, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.

Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache
data structure.

The documentation can be obtained at the links below:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger 
Reviewed-by: Reinette Chatre 
---
v3: Fixed checkpatch suggestions. Addred Reviewed-by from Reinette. 

v2: Taken care of Reinette's comments. Changed the field name to
arch_has_per_cpu_cfg to be bit more meaningful about the CPU scope.
Also fixed some wordings.

https://lore.kernel.org/lkml/160589301962.26308.4728709200492788764.stgit@bmoger-ubuntu/

v1: 
https://lore.kernel.org/lkml/160469365104.21002.2901190946502347327.stgit@bmoger-ubuntu/

 arch/x86/kernel/cpu/resctrl/core.c |4 
 arch/x86/kernel/cpu/resctrl/internal.h |3 +++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |9 +++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index e5f4ee8f4c3b..e8b5f1cf1ae8 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -570,6 +570,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
if (d) {
cpumask_set_cpu(cpu, >cpu_mask);
+   if (r->cache.arch_has_per_cpu_cfg)
+   rdt_domain_reconfigure_cdp(r);
return;
}
 
@@ -923,6 +925,7 @@ static __init void rdt_init_res_defs_intel(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = false;
r->cache.arch_has_empty_bitmaps = false;
+   r->cache.arch_has_per_cpu_cfg = false;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_THRTL_BASE;
r->msr_update = mba_wrmsr_intel;
@@ -943,6 +946,7 @@ static __init void rdt_init_res_defs_amd(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = true;
r->cache.arch_has_empty_bitmaps = true;
+   r->cache.arch_has_per_cpu_cfg = true;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_BW_BASE;
r->msr_update = mba_wrmsr_amd;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 80fa997fae60..f65d3c0dbc41 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -360,6 +360,8 @@ struct msr_param {
  * executing entities
  * @arch_has_sparse_bitmaps:   True if a bitmap like f00f is valid.
  * @arch_has_empty_bitmaps:True if the '0' bitmap is valid.
+ * @arch_has_per_cpu_cfg:  True if QOS_CFG register for this cache
+ * level has CPU scope.
  */
 struct rdt_cache {
unsigned intcbm_len;
@@ -369,6 +371,7 @@ struct rdt_cache {
unsigned intshareable_bits;
boolarch_has_sparse_bitmaps;
boolarch_has_empty_bitmaps;
+   boolarch_has_per_cpu_cfg;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index af323e2e3100..6abd8ef9a674 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1905,8 +1905,13 @@ static int set_cache_qos_cfg(int level, bool enable)
 
r_l = _resources_all[level];
list_for_each_entry(d, _l->domains, list) {
-   /* Pick one CPU from each domain instance to update MSR */
-   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
+   if (r_l->cache.arch_has_per_cpu_cfg)
+   /* Pick all the CPUs in the domain instance */
+   for_each_cpu(cpu, >cpu_mask)
+   cpumask_set_cpu(cpu, cpu_mask);
+   else
+   /* Pick one CPU from each domain instance to update MSR 
*/
+   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
}
cpu = get_cpu();
/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */



RE: [PATCH v2] x86/resctrl: Fix AMD L3 QOS CDP enable/disable

2020-11-30 Thread Babu Moger
Hi Reinette,

> -Original Message-
> From: Reinette Chatre 
> Sent: Tuesday, November 24, 2020 11:23 AM
> To: Moger, Babu ; b...@alien8.de
> Cc: fenghua...@intel.com; x...@kernel.org; linux-kernel@vger.kernel.org;
> mi...@redhat.com; h...@zytor.com; t...@linutronix.de
> Subject: Re: [PATCH v2] x86/resctrl: Fix AMD L3 QOS CDP enable/disable
> 
> Hi Babu,
> 
> On 11/20/2020 9:25 AM, Babu Moger wrote:
> > When the AMD QoS feature CDP(code and data prioritization) is enabled
> > or disabled, the CDP bit in MSR _0C81 is written on one of the
> > CPUs in L3 domain(core complex). That is not correct. The CDP bit
> > needs to be updated all the logical CPUs in the domain.
> >
> > This was not spelled out clearly in the spec earlier. The
> > specification has been updated. The updated specification, "AMD64
> > Technology Platform Quality of Service Extensions Publication # 56375
> > Revision: 1.02 Issue
> > Date: October 2020" is available now. Refer the section: Code and Data
> > Prioritization.
> >
> > Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache
> > data structure.
> >
> > The documentation can be obtained at the links below:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeve
> > loper.amd.com%2Fwp-
> content%2Fresources%2F56375.pdfdata=04%7C01%7C
> >
> babu.moger%40amd.com%7C5dd411c029da43716aad08d8909daa39%7C3dd89
> 61fe488
> >
> 4e608e11a82d994e183d%7C0%7C1%7C637418354605231589%7CUnknown%7
> CTWFpbGZs
> >
> b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D
> >
> %7C1000sdata=uhSBwxk%2BvcdCjgkq%2B0ew%2Fx1abL32KJEoe7Dil1CF
> qX4%3D
> > reserved=0
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537data=04%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7C5dd411c029da43716aad08d8909daa39%7C3dd8961fe48
> 84e608e
> >
> 11a82d994e183d%7C0%7C1%7C637418354605231589%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> 100
> >
> 0sdata=5LWDsBKkTmKfKrDfALJQlo6PySMtBVX2iVna9KaiWwE%3Dr
> eserve
> > d=0
> >
> > Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
> > Signed-off-by: Babu Moger 
> > ---
> > v2: Taken care of Reinette's comments. Changed the field name to
> >  arch_has_per_cpu_cfg to be bit more meaningful about the CPU scope.
> >  Also fixed some wordings.
> >
> > v1:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> >
> .kernel.org%2Flkml%2F160469365104.21002.2901190946502347327.stgit%40b
> m
> > oger-
> ubuntu%2Fdata=04%7C01%7Cbabu.moger%40amd.com%7C5dd411c029
> da4
> >
> 3716aad08d8909daa39%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C1%
> 7C63741
> >
> 8354605241539%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> IjoiV2lu
> >
> MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=VNVZzPr9IvV4Hp
> tYI9
> > VqCN8uXLtlKBVtG%2FUaGEavRLM%3Dreserved=0
> >
> >   arch/x86/kernel/cpu/resctrl/core.c |4 
> >   arch/x86/kernel/cpu/resctrl/internal.h |3 +++
> >   arch/x86/kernel/cpu/resctrl/rdtgroup.c |9 +++--
> >   3 files changed, 14 insertions(+), 2 deletions(-)
> >
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> > b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 80fa997fae60..bcd9b517c765 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -360,6 +360,8 @@ struct msr_param {
> >*executing entities
> >* @arch_has_sparse_bitmaps:  True if a bitmap like f00f is valid.
> >* @arch_has_empty_bitmaps:   True if the '0' bitmap is valid.
> > + * @arch_has_per_cpu_cfg:  True if QOS_CFG register for this cache
> > + * level has CPU scope.
> 
> Please fixup the spacing to not have spaces before tabs. This will make
> checkpatch happy and fit with in with the rest of the comments for this 
> struct.

Sure. Will fix it.
> 
> >*/
> >   struct rdt_cache {
> > unsigned intcbm_len;
> > @@ -369,6 +371,7 @@ struct rdt_cache {
> > unsigned intshareable_bits;
> > boolarch_has_sparse_bitmaps;
> > boolarch_has_empty_bitmaps;
> > +   boolarch_has_per_cpu_cfg;
> >   };
> >
> >   /**
> 
> ...
> 
> This patch looks good to me.
> 
> With the one style comment addressed you can add:
> Reviewed-by: Reinette Chatre 
> 
Thanks
-Babu


[PATCH v2] x86/resctrl: Fix AMD L3 QOS CDP enable/disable

2020-11-20 Thread Babu Moger
When the AMD QoS feature CDP(code and data prioritization) is enabled
or disabled, the CDP bit in MSR _0C81 is written on one of the
CPUs in L3 domain(core complex). That is not correct. The CDP bit needs
to be updated all the logical CPUs in the domain.

This was not spelled out clearly in the spec earlier. The specification
has been updated. The updated specification, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.

Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache
data structure.

The documentation can be obtained at the links below:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger 
---
v2: Taken care of Reinette's comments. Changed the field name to
arch_has_per_cpu_cfg to be bit more meaningful about the CPU scope.
Also fixed some wordings.

v1: 
https://lore.kernel.org/lkml/160469365104.21002.2901190946502347327.stgit@bmoger-ubuntu/

 arch/x86/kernel/cpu/resctrl/core.c |4 
 arch/x86/kernel/cpu/resctrl/internal.h |3 +++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |9 +++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index e5f4ee8f4c3b..e8b5f1cf1ae8 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -570,6 +570,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
if (d) {
cpumask_set_cpu(cpu, >cpu_mask);
+   if (r->cache.arch_has_per_cpu_cfg)
+   rdt_domain_reconfigure_cdp(r);
return;
}
 
@@ -923,6 +925,7 @@ static __init void rdt_init_res_defs_intel(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = false;
r->cache.arch_has_empty_bitmaps = false;
+   r->cache.arch_has_per_cpu_cfg = false;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_THRTL_BASE;
r->msr_update = mba_wrmsr_intel;
@@ -943,6 +946,7 @@ static __init void rdt_init_res_defs_amd(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = true;
r->cache.arch_has_empty_bitmaps = true;
+   r->cache.arch_has_per_cpu_cfg = true;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_BW_BASE;
r->msr_update = mba_wrmsr_amd;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 80fa997fae60..bcd9b517c765 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -360,6 +360,8 @@ struct msr_param {
  * executing entities
  * @arch_has_sparse_bitmaps:   True if a bitmap like f00f is valid.
  * @arch_has_empty_bitmaps:True if the '0' bitmap is valid.
+ * @arch_has_per_cpu_cfg:  True if QOS_CFG register for this cache
+ * level has CPU scope.
  */
 struct rdt_cache {
unsigned intcbm_len;
@@ -369,6 +371,7 @@ struct rdt_cache {
unsigned intshareable_bits;
boolarch_has_sparse_bitmaps;
boolarch_has_empty_bitmaps;
+   boolarch_has_per_cpu_cfg;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index af323e2e3100..6abd8ef9a674 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1905,8 +1905,13 @@ static int set_cache_qos_cfg(int level, bool enable)
 
r_l = _resources_all[level];
list_for_each_entry(d, _l->domains, list) {
-   /* Pick one CPU from each domain instance to update MSR */
-   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
+   if (r_l->cache.arch_has_per_cpu_cfg)
+   /* Pick all the CPUs in the domain instance */
+   for_each_cpu(cpu, >cpu_mask)
+   cpumask_set_cpu(cpu, cpu_mask);
+   else
+   /* Pick one CPU from each domain instance to update MSR 
*/
+   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
}
cpu = get_cpu();
/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */



Re: [PATCH] x86/resctrl: Fix AMD L3 QOS CDP enable/disable

2020-11-19 Thread Babu Moger
Hi Reinette,


On 11/18/20 4:18 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 11/6/2020 12:14 PM, Babu Moger wrote:
>> When the AMD QoS feature CDP(code and data prioritization) is enabled
>> or disabled, the CDP bit in MSR _0C81 is written on one of the
>> cpus in L3 domain(core complex). That is not correct. The CDP bit needs
>> to be updated all the logical cpus in the domain.
> 
> Could you please use CPU instead of cpu throughout, in commit message as
> well as the new code comments?

Sure. Will do.

> 
>>
>> This was not spelled out clearly in the spec earlier. The specification
>> has been updated. The updated specification, "AMD64 Technology Platform
>> Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
>> Date: October 2020" is available now. Refer the section: Code and Data
>> Prioritization.
>>
>> Fix the issue by adding a new flag arch_needs_update_all in rdt_cache
>> data structure.
> 
> I understand that naming is hard and could be a sticky point. Even so, I
> am concerned that this name is too generic. For example, there are other
> cache settings that are successfully set on a single CPU in the L3 domain
> (the bitmasks for example). This new name and its description in the code
> comments below does not make it clear which cache settings it applies to.
> 
> I interpret this change to mean that the L[23]_QOS_CFG MSR has CPU scope
> while the other L3 QoS configuration registers have the same scope as the
> L3 cache. Could this new variable thus perhaps be named
> "arch_has_per_cpu_cfg"? I considered "arch_has_per_cpu_cdp" but when a new
> field is added to that register it may cause confusion.

Sounds good. Will change it to arch_has_per_cpu_cfg.
> 
>> The documentation can be obtained at the links below:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F56375.pdfdata=04%7C01%7Cbabu.moger%40amd.com%7C3b793291233443b27f6d08d88c0fdc9f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637413347449195250%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=%2B4AUG%2FbAeq70s0XuZ9J%2FOTTEFO8EypLvcR6yBuWE8U4%3Dreserved=0
>>
>> Link:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537data=04%7C01%7Cbabu.moger%40amd.com%7C3b793291233443b27f6d08d88c0fdc9f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637413347449195250%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=4lzj2ivqxvFaLC99TOIJEGU3p6CmlBLCPHT80LlKsNE%3Dreserved=0
>>
>>
>> Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
>>
>> Signed-off-by: Babu Moger 
>> ---
>>   arch/x86/kernel/cpu/resctrl/core.c |    3 +++
>>   arch/x86/kernel/cpu/resctrl/internal.h |    3 +++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c |    9 +++--
>>   3 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
>> b/arch/x86/kernel/cpu/resctrl/core.c
>> index e5f4ee8f4c3b..142c92a12254 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -570,6 +570,8 @@ static void domain_add_cpu(int cpu, struct
>> rdt_resource *r)
>>     if (d) {
>>   cpumask_set_cpu(cpu, >cpu_mask);
>> +    if (r->cache.arch_needs_update_all)
>> +    rdt_domain_reconfigure_cdp(r);
>>   return;
>>   }
>>   @@ -943,6 +945,7 @@ static __init void rdt_init_res_defs_amd(void)
>>   r->rid == RDT_RESOURCE_L2CODE) {
>>   r->cache.arch_has_sparse_bitmaps = true;
>>   r->cache.arch_has_empty_bitmaps = true;
>> +    r->cache.arch_needs_update_all = true;
>>   } else if (r->rid == RDT_RESOURCE_MBA) {
>>   r->msr_base = MSR_IA32_MBA_BW_BASE;
>>   r->msr_update = mba_wrmsr_amd;
> 
> The current pattern is to set these flags on all the architectures. Could
> you thus please set the flag within rdt_init_defs_intel()? I confirmed
> that the scope is the same as the cache domain in Intel RDT so the flag
> should be false.

Yes, Will add that.

> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 80fa997fae60..d23262d59a51 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -360,6 +360,8 @@ struct msr_param {
>>    *    executing enti

[PATCH v2 0/2] Fix AMD SEV guest boot issue with PCID feature

2020-11-12 Thread Babu Moger
SEV guests fail to boot on systems that support the PCID feature.

The problem is observed with SMM enabled OVMF bios is used. The guest
crashes with the following messages on the console while loading.

--
[0.264224] tsc: Marking TSC unstable due to TSCs unsynchronized
[0.264946] Calibrating delay loop (skipped) preset value.. 3194.00
 BogoMIPS (lpj=1597000)
[0.265946] pid_max: default: 65536 minimum: 512
KVM internal error. Suberror: 1
emulation failure
EAX= EBX= ECX= EDX=
ESI= EDI=7ffac000 EBP= ESP=7ffa1ff8
EIP=7ffb4280 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0020   00c09300 DPL=0 DS   [-WA]
CS =  0fff 9b00 DPL=0 CS16 [-RA]
SS =0020   00c09300 DPL=0 DS   [-WA]
DS =0020   00c09300 DPL=0 DS   [-WA]
FS =0020   00c09300 DPL=0 DS   [-WA]
GS =0020   00c09300 DPL=0 DS   [-WA]
LDT=   
TR =0040 3000 4087 8b00 DPL=0 TSS64-busy
GDT= fe001000 007f
IDT= fe00 0fff
CR0=80050033 CR2=88817000 CR3=0008000107e12000 CR4=000606b0
DR0= DR1= DR2=
DR3= DR6=0ff0 DR7=0400
EFER=0d01
--

The issue is root caused to the way kvm tries to validate the cr3
address in kvm_set_cr3(). The cr3 address in SEV guests have the encryption
bit set. KVM fails because the reserved bit check fails on this address.

This series fixes the problem by introducing a new field in kvm_vcpu_arch
structure. The new field cr3_lm_rsvd_bits is initialized to 
rsvd_bits(cpuid_maxphyaddr(vcpu), 63) in kvm_vcpu_after_set_cpuid
and clear the any reserved bit specific to vendor in
kvm_x86_ops.vcpu_after_set_cpuid
---
v2: Changed the code suggested by Paolo. Added a new field in kvm_vcpu_arch
to hold the reserved bits in cr3_lm_rsvd_bits.

v1:
https://lore.kernel.org/lkml/160514082171.31583.9995411273370528911.stgit@bmoger-ubuntu/

Babu Moger (2):
  KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
  KVM:SVM: Update cr3_lm_rsvd_bits for AMD SEV guests


 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/cpuid.c|2 ++
 arch/x86/kvm/svm/svm.c  |   11 +++
 arch/x86/kvm/x86.c  |2 +-
 4 files changed, 15 insertions(+), 1 deletion(-)

--


[PATCH v2 2/2] KVM:SVM: Update cr3_lm_rsvd_bits for AMD SEV guests

2020-11-12 Thread Babu Moger
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask
the memory encryption bit in reserved bits.

Signed-off-by: Babu Moger 
---
 arch/x86/kvm/svm/svm.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2f32fd09e259..b418eeab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3741,6 +3741,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   struct kvm_cpuid_entry2 *best;
 
vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
boot_cpu_has(X86_FEATURE_XSAVE) &&
@@ -3753,6 +3754,16 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu 
*vcpu)
/* Check again if INVPCID interception if required */
svm_check_invpcid(svm);
 
+   /*
+* For sev guests, update the cr3_lm_rsvd_bits to mask the memory
+* encryption bit from reserved bits
+*/
+   if (sev_guest(vcpu->kvm)) {
+   best = kvm_find_cpuid_entry(vcpu, 0x801F, 0);
+   if (best)
+   vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 
0x3f));
+   }
+
if (!kvm_vcpu_apicv_active(vcpu))
return;
 



[PATCH v2 1/2] KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch

2020-11-12 Thread Babu Moger
SEV guests fail to boot on a system that supports the PCID feature.

While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set. The
reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.

Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
cache the reserved bits in the CR3 value. This will be initialized
to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).

If the architecture has any special bits(like AMD SEV encryption bit)
that needs to be masked from the reserved bits, should be cleared
in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.

Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/cpuid.c|2 ++
 arch/x86/kvm/x86.c  |2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d44858b69353..324ddd7fd0aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -639,6 +639,7 @@ struct kvm_vcpu_arch {
int cpuid_nent;
struct kvm_cpuid_entry2 *cpuid_entries;
 
+   unsigned long cr3_lm_rsvd_bits;
int maxphyaddr;
int max_tdp_level;
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 06a278b3701d..cb52485cc507 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -169,6 +169,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vcpu->arch.cr4_guest_rsvd_bits =
__cr4_reserved_bits(guest_cpuid_has, vcpu);
 
+   vcpu->arch.cr3_lm_rsvd_bits = rsvd_bits(cpuid_maxphyaddr(vcpu), 63);
+
/* Invoke the vendor callback only after the above state is updated. */
kvm_x86_ops.vcpu_after_set_cpuid(vcpu);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f5ede41bf9e6..ff55e33b268b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1042,7 +1042,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
}
 
if (is_long_mode(vcpu) &&
-   (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
+   (cr3 & vcpu->arch.cr3_lm_rsvd_bits))
return 1;
else if (is_pae_paging(vcpu) &&
 !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))



Re: [PATCH 2/2] KVM:SVM: Mask SEV encryption bit from CR3 reserved bits

2020-11-12 Thread Babu Moger



On 11/12/20 2:32 AM, Paolo Bonzini wrote:
> On 12/11/20 01:28, Babu Moger wrote:
>> Add support to the mask_cr3_rsvd_bits() callback to mask the
>> encryption bit from the CR3 value when SEV is enabled.
>>
>> Additionally, cache the encryption mask for quick access during
>> the check.
>>
>> Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to
>> CR3")
>> Signed-off-by: Babu Moger 
>> ---
>>   arch/x86/kvm/svm/svm.c |   11 ++-
>>   arch/x86/kvm/svm/svm.h |    3 +++
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index a491a47d7f5c..c2b1e52810c6 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3741,6 +3741,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu,
>> gfn_t gfn, bool is_mmio)
>>   static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>   {
>>   struct vcpu_svm *svm = to_svm(vcpu);
>> +    struct kvm_cpuid_entry2 *best;
>>     vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu,
>> X86_FEATURE_XSAVE) &&
>>   boot_cpu_has(X86_FEATURE_XSAVE) &&
>> @@ -3771,6 +3772,12 @@ static void svm_vcpu_after_set_cpuid(struct
>> kvm_vcpu *vcpu)
>>   if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
>>   kvm_request_apicv_update(vcpu->kvm, false,
>>    APICV_INHIBIT_REASON_NESTED);
>> +
>> +    best = kvm_find_cpuid_entry(vcpu, 0x801F, 0);
>> +    if (best)
>> +    svm->sev_enc_mask = ~(1UL << (best->ebx & 0x3f));
>> +    else
>> +    svm->sev_enc_mask = ~0UL;
>>   }
>>     static bool svm_has_wbinvd_exit(void)
>> @@ -4072,7 +4079,9 @@ static void enable_smi_window(struct kvm_vcpu *vcpu)
>>     static unsigned long svm_mask_cr3_rsvd_bits(struct kvm_vcpu *vcpu,
>> unsigned long cr3)
>>   {
>> -    return cr3;
>> +    struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +    return sev_guest(vcpu->kvm) ? (cr3 & svm->sev_enc_mask) : cr3;
>>   }
>>     static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void
>> *insn, int insn_len)
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 1d853fe4c778..57a36645a0e4 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -152,6 +152,9 @@ struct vcpu_svm {
>>   u64 *avic_physical_id_cache;
>>   bool avic_is_running;
>>   +    /* SEV Memory encryption mask */
>> +    unsigned long sev_enc_mask;
>> +
>>   /*
>>    * Per-vcpu list of struct amd_svm_iommu_ir:
>>    * This is used mainly to store interrupt remapping information used
>>
> 
> Instead of adding a new callback, you can add a field to struct
> kvm_vcpu_arch:
> 
>  if (is_long_mode(vcpu) &&
> -    (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
> +    (cr3 & vcpu->arch.cr3_lm_rsvd_bits))
> 
> Set it in kvm_vcpu_after_set_cpuid, and clear the memory encryption bit in
> kvm_x86_ops.vcpu_after_set_cpuid.

Yes. That should work. Will resubmit the patches. Thanks


[PATCH 2/2] KVM:SVM: Mask SEV encryption bit from CR3 reserved bits

2020-11-11 Thread Babu Moger
Add support to the mask_cr3_rsvd_bits() callback to mask the
encryption bit from the CR3 value when SEV is enabled.

Additionally, cache the encryption mask for quick access during
the check.

Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger 
---
 arch/x86/kvm/svm/svm.c |   11 ++-
 arch/x86/kvm/svm/svm.h |3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a491a47d7f5c..c2b1e52810c6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3741,6 +3741,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   struct kvm_cpuid_entry2 *best;
 
vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
boot_cpu_has(X86_FEATURE_XSAVE) &&
@@ -3771,6 +3772,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu 
*vcpu)
if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
kvm_request_apicv_update(vcpu->kvm, false,
 APICV_INHIBIT_REASON_NESTED);
+
+   best = kvm_find_cpuid_entry(vcpu, 0x801F, 0);
+   if (best)
+   svm->sev_enc_mask = ~(1UL << (best->ebx & 0x3f));
+   else
+   svm->sev_enc_mask = ~0UL;
 }
 
 static bool svm_has_wbinvd_exit(void)
@@ -4072,7 +4079,9 @@ static void enable_smi_window(struct kvm_vcpu *vcpu)
 
 static unsigned long svm_mask_cr3_rsvd_bits(struct kvm_vcpu *vcpu, unsigned 
long cr3)
 {
-   return cr3;
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   return sev_guest(vcpu->kvm) ? (cr3 & svm->sev_enc_mask) : cr3;
 }
 
 static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int 
insn_len)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1d853fe4c778..57a36645a0e4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -152,6 +152,9 @@ struct vcpu_svm {
u64 *avic_physical_id_cache;
bool avic_is_running;
 
+   /* SEV Memory encryption mask */
+   unsigned long sev_enc_mask;
+
/*
 * Per-vcpu list of struct amd_svm_iommu_ir:
 * This is used mainly to store interrupt remapping information used



[PATCH 1/2] KVM: x86: Introduce mask_cr3_rsvd_bits to mask memory encryption bit

2020-11-11 Thread Babu Moger
SEV guests fail to boot on a system that supports the PCID feature.

While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set.
The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.

Introduce a generic callback function that can be used to mask bits
within the CR3 value before being checked by kvm_set_cr3().

Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/svm/svm.c  |6 ++
 arch/x86/kvm/vmx/vmx.c  |6 ++
 arch/x86/kvm/x86.c  |3 ++-
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d44858b69353..e791f841e0c2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1265,6 +1265,8 @@ struct kvm_x86_ops {
int (*pre_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
int (*pre_leave_smm)(struct kvm_vcpu *vcpu, const char *smstate);
void (*enable_smi_window)(struct kvm_vcpu *vcpu);
+   unsigned long (*mask_cr3_rsvd_bits)(struct kvm_vcpu *vcpu,
+   unsigned long cr3);
 
int (*mem_enc_op)(struct kvm *kvm, void __user *argp);
int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2f32fd09e259..a491a47d7f5c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4070,6 +4070,11 @@ static void enable_smi_window(struct kvm_vcpu *vcpu)
}
 }
 
+static unsigned long svm_mask_cr3_rsvd_bits(struct kvm_vcpu *vcpu, unsigned 
long cr3)
+{
+   return cr3;
+}
+
 static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int 
insn_len)
 {
bool smep, smap, is_user;
@@ -4285,6 +4290,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.pre_enter_smm = svm_pre_enter_smm,
.pre_leave_smm = svm_pre_leave_smm,
.enable_smi_window = enable_smi_window,
+   .mask_cr3_rsvd_bits = svm_mask_cr3_rsvd_bits,
 
.mem_enc_op = svm_mem_enc_op,
.mem_enc_reg_region = svm_register_enc_region,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 47b8357b9751..68920338b36a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7556,6 +7556,11 @@ static void enable_smi_window(struct kvm_vcpu *vcpu)
/* RSM will cause a vmexit anyway.  */
 }
 
+static unsigned long vmx_mask_cr3_rsvd_bits(struct kvm_vcpu *vcpu, unsigned 
long cr3)
+{
+   return cr3;
+}
+
 static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 {
return to_vmx(vcpu)->nested.vmxon;
@@ -7709,6 +7714,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.pre_enter_smm = vmx_pre_enter_smm,
.pre_leave_smm = vmx_pre_leave_smm,
.enable_smi_window = enable_smi_window,
+   .mask_cr3_rsvd_bits = vmx_mask_cr3_rsvd_bits,
 
.can_emulate_instruction = vmx_can_emulate_instruction,
.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f5ede41bf9e6..43a8d40bcfbf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1042,7 +1042,8 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
}
 
if (is_long_mode(vcpu) &&
-   (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
+   (kvm_x86_ops.mask_cr3_rsvd_bits(vcpu, cr3) &
+rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
return 1;
else if (is_pae_paging(vcpu) &&
 !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))



[PATCH 0/2] Fix AMD SEV guest boot issue with PCID feature

2020-11-11 Thread Babu Moger
SEV guests fail to boot on systems that support the PCID feature.

The problem is observed with SMM enabled OVMF build. The guest
crashes with the following messages on the console while loading.

--
[0.264224] tsc: Marking TSC unstable due to TSCs unsynchronized
[0.264946] Calibrating delay loop (skipped) preset value.. 3194.00
 BogoMIPS (lpj=1597000)
[0.265946] pid_max: default: 65536 minimum: 512
KVM internal error. Suberror: 1
emulation failure
EAX= EBX= ECX= EDX=
ESI= EDI=7ffac000 EBP= ESP=7ffa1ff8
EIP=7ffb4280 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0020   00c09300 DPL=0 DS   [-WA]
CS =  0fff 9b00 DPL=0 CS16 [-RA]
SS =0020   00c09300 DPL=0 DS   [-WA]
DS =0020   00c09300 DPL=0 DS   [-WA]
FS =0020   00c09300 DPL=0 DS   [-WA]
GS =0020   00c09300 DPL=0 DS   [-WA]
LDT=   
TR =0040 3000 4087 8b00 DPL=0 TSS64-busy
GDT= fe001000 007f
IDT= fe00 0fff
CR0=80050033 CR2=88817000 CR3=0008000107e12000 CR4=000606b0
DR0= DR1= DR2=
DR3= DR6=0ff0 DR7=0400
EFER=0d01
--

The issue is root caused to the way kvm tries to validate the cr3
address in kvm_set_cr3(). The cr3 address in SEV guests have the encryption
bit set. KVM fails because the reserved bit check fails on this address.

This series fixes the problem by introducing a new kvm_x86_ops callback
function to detect the encryption bit and mask it during the check.
---

Babu Moger (2):
  KVM: x86: Introduce mask_cr3_rsvd_bits to mask memory encryption bit
  KVM:SVM: Mask SEV encryption bit from CR3 reserved bits


 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/svm/svm.c  |   15 +++
 arch/x86/kvm/svm/svm.h  |3 +++
 arch/x86/kvm/vmx/vmx.c  |6 ++
 arch/x86/kvm/x86.c  |3 ++-
 5 files changed, 28 insertions(+), 1 deletion(-)

--


RE: [PATCH v3 0/2] KVM: SVM: Create separate vmcbs for L1 and L2

2020-11-11 Thread Babu Moger
Hi Cathy,
I was going to test these patches. But it did not apply on my tree.
Tried on kvm(https://git.kernel.org/pub/scm/virt/kvm/kvm.git) and
Mainline
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git). What
is your base tree?
thanks
Babu

> -Original Message-
> From: Cathy Avery 
> Sent: Monday, October 26, 2020 12:42 PM
> To: linux-kernel@vger.kernel.org; k...@vger.kernel.org; pbonz...@redhat.com
> Cc: vkuzn...@redhat.com; Huang2, Wei ;
> mlevi...@redhat.com; sean.j.christopher...@intel.com
> Subject: [PATCH v3 0/2] KVM: SVM: Create separate vmcbs for L1 and L2
> 
> svm->vmcb will now point to either a separate vmcb L1 ( not nested ) or L2 
> vmcb
> ( nested ).
> 
> Changes:
> v2 -> v3
>  - Added vmcb switching helper.
>  - svm_set_nested_state always forces to L1 before determining state
>to set. This is more like vmx and covers any potential L2 to L2 nested 
> state
> switch.
>  - Moved svm->asid tracking to pre_svm_run and added ASID set dirty bit
>checking.
> 
> v1 -> v2
>  - Removed unnecessary update check of L1 save.cr3 during nested_svm_vmexit.
>  - Moved vmcb01_pa to svm.
>  - Removed get_host_vmcb() function.
>  - Updated vmsave/vmload corresponding vmcb state during L2
>enter and exit which fixed the L2 load issue.
>  - Moved asid workaround to a new patch which adds asid to svm.
>  - Init previously uninitialized L2 vmcb save.gpat and save.cr4
> 
> Tested:
> kvm-unit-tests
> kvm self tests
> Loaded fedora nested guest on fedora
> 
> Cathy Avery (2):
>   KVM: SVM: Track asid from vcpu_svm
>   KVM: SVM: Use a separate vmcb for the nested L2 guest
> 
>  arch/x86/kvm/svm/nested.c | 125 ++
>  arch/x86/kvm/svm/svm.c|  58 +++---
>  arch/x86/kvm/svm/svm.h|  51 +---
>  3 files changed, 110 insertions(+), 124 deletions(-)
> 
> --
> 2.20.1



[PATCH] x86/resctrl: Fix AMD L3 QOS CDP enable/disable

2020-11-06 Thread Babu Moger
When the AMD QoS feature CDP(code and data prioritization) is enabled
or disabled, the CDP bit in MSR _0C81 is written on one of the
cpus in L3 domain(core complex). That is not correct. The CDP bit needs
to be updated all the logical cpus in the domain.

This was not spelled out clearly in the spec earlier. The specification
has been updated. The updated specification, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.

Fix the issue by adding a new flag arch_needs_update_all in rdt_cache
data structure.

The documentation can be obtained at the links below:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")

Signed-off-by: Babu Moger 
---
 arch/x86/kernel/cpu/resctrl/core.c |3 +++
 arch/x86/kernel/cpu/resctrl/internal.h |3 +++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |9 +++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index e5f4ee8f4c3b..142c92a12254 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -570,6 +570,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
if (d) {
cpumask_set_cpu(cpu, >cpu_mask);
+   if (r->cache.arch_needs_update_all)
+   rdt_domain_reconfigure_cdp(r);
return;
}
 
@@ -943,6 +945,7 @@ static __init void rdt_init_res_defs_amd(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = true;
r->cache.arch_has_empty_bitmaps = true;
+   r->cache.arch_needs_update_all = true;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_BW_BASE;
r->msr_update = mba_wrmsr_amd;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 80fa997fae60..d23262d59a51 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -360,6 +360,8 @@ struct msr_param {
  * executing entities
  * @arch_has_sparse_bitmaps:   True if a bitmap like f00f is valid.
  * @arch_has_empty_bitmaps:True if the '0' bitmap is valid.
+ * @arch_needs_update_all: True if arch needs to update the cache
+ * settings on all the cpus in the domain.
  */
 struct rdt_cache {
unsigned intcbm_len;
@@ -369,6 +371,7 @@ struct rdt_cache {
unsigned intshareable_bits;
boolarch_has_sparse_bitmaps;
boolarch_has_empty_bitmaps;
+   boolarch_needs_update_all;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index af323e2e3100..a005e90b373a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1905,8 +1905,13 @@ static int set_cache_qos_cfg(int level, bool enable)
 
r_l = _resources_all[level];
list_for_each_entry(d, _l->domains, list) {
-   /* Pick one CPU from each domain instance to update MSR */
-   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
+   if (r_l->cache.arch_needs_update_all)
+   /* Pick all the cpus in the domain instance */
+   for_each_cpu(cpu, >cpu_mask)
+   cpumask_set_cpu(cpu, cpu_mask);
+   else
+   /* Pick one CPU from each domain instance to update MSR 
*/
+   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
}
cpu = get_cpu();
/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */



RE: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts

2020-09-23 Thread Babu Moger



> -Original Message-
> From: Paolo Bonzini 
> Sent: Tuesday, September 22, 2020 9:44 PM
> To: Moger, Babu ; Sean Christopherson
> 
> Cc: vkuzn...@redhat.com; jmatt...@google.com; wanpen...@tencent.com;
> k...@vger.kernel.org; j...@8bytes.org; x...@kernel.org; linux-
> ker...@vger.kernel.org; mi...@redhat.com; b...@alien8.de; h...@zytor.com;
> t...@linutronix.de
> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to
> generic intercepts
> 
> On 22/09/20 21:11, Babu Moger wrote:
> >
> >
> >> -Original Message-
> >> From: Paolo Bonzini 
> >> Sent: Tuesday, September 22, 2020 8:39 AM
> >> To: Sean Christopherson 
> >> Cc: Moger, Babu ; vkuzn...@redhat.com;
> >> jmatt...@google.com; wanpen...@tencent.com; k...@vger.kernel.org;
> >> j...@8bytes.org; x...@kernel.org; linux-kernel@vger.kernel.org;
> >> mi...@redhat.com; b...@alien8.de; h...@zytor.com; t...@linutronix.de
> >> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions
> >> to generic intercepts
> >>
> >> On 14/09/20 17:06, Sean Christopherson wrote:
> >>>> I think these should take a vector instead, and add 64 in the functions.
> >>>
> >>> And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?
> >>
> >> Not sure if we can assume it to be constant, but WARN_ON_ONCE is good
> >> enough as far as performance is concerned.  The same int->u32 +
> >> WARN_ON_ONCE should be done in patch 1.
> >
> > Paolo, Ok sure. Will change "int bit" to "u32 vector". I will send a
> > new patch to address this. This needs to be addressed in all these
> > functions, vmcb_set_intercept, vmcb_clr_intercept, vmcb_is_intercept,
> > set_exception_intercept, clr_exception_intercept, svm_set_intercept,
> > svm_clr_intercept, svm_is_intercept.
> >
> > Also will add WARN_ON_ONCE(vector > 32); on set_exception_intercept,
> > clr_exception_intercept.  Does that sound good?
> 
> I can do the fixes myself, no worries.  It should get to kvm/next this week.
Ok. Thanks
Babu


RE: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts

2020-09-22 Thread Babu Moger



> -Original Message-
> From: Paolo Bonzini 
> Sent: Tuesday, September 22, 2020 8:39 AM
> To: Sean Christopherson 
> Cc: Moger, Babu ; vkuzn...@redhat.com;
> jmatt...@google.com; wanpen...@tencent.com; k...@vger.kernel.org;
> j...@8bytes.org; x...@kernel.org; linux-kernel@vger.kernel.org;
> mi...@redhat.com; b...@alien8.de; h...@zytor.com; t...@linutronix.de
> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to
> generic intercepts
> 
> On 14/09/20 17:06, Sean Christopherson wrote:
> >> I think these should take a vector instead, and add 64 in the functions.
> >
> > And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?
> 
> Not sure if we can assume it to be constant, but WARN_ON_ONCE is good
> enough as far as performance is concerned.  The same int->u32 +
> WARN_ON_ONCE should be done in patch 1.

Paolo, Ok sure. Will change "int bit" to "u32 vector". I will send a new
patch to address this. This needs to be addressed in all these functions,
vmcb_set_intercept, vmcb_clr_intercept, vmcb_is_intercept,
set_exception_intercept, clr_exception_intercept, svm_set_intercept,
svm_clr_intercept, svm_is_intercept.

Also will add WARN_ON_ONCE(vector > 32); on set_exception_intercept,
clr_exception_intercept.  Does that sound good?

> 
> Thanks for the review!
> 
> Paolo



RE: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest

2020-09-18 Thread Babu Moger



> -Original Message-
> From: Cathy Avery 
> Sent: Friday, September 18, 2020 10:27 AM
> To: Moger, Babu ; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; pbonz...@redhat.com
> Cc: vkuzn...@redhat.com; Huang2, Wei 
> Subject: Re: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest
> 
> On 9/18/20 11:16 AM, Babu Moger wrote:
> > Cathy,
> > Thanks for the patches. It cleans up the code nicely.
> > But there are some issues with the patch. I was able to bring the L1
> > guest with your patch. But when I tried to load L2 guest it crashed. I
> > am thinking It is mostly due to save/restore part of vmcb. Few comments
> below.
> >
> >> -Original Message-
> >> From: Cathy Avery 
> >> Sent: Thursday, September 17, 2020 2:23 PM
> >> To: linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> >> pbonz...@redhat.com
> >> Cc: vkuzn...@redhat.com; Huang2, Wei 
> >> Subject: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2
> >> guest
> >>
> >> svm->vmcb will now point to either a separate vmcb L1 ( not nested )
> >> svm->or L2 vmcb
> >> ( nested ).
> >>
> >> Issues:
> >>
> >> 1) There is some wholesale copying of vmcb.save and vmcb.contol
> >> areas which will need to be refined.
> >>
> >> 2) There is a workaround in nested_svm_vmexit() where
> >>
> >> if (svm->vmcb01->control.asid == 0)
> >> svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
> >>
> >> This was done as a result of the kvm selftest 'state_test'. In that
> >> test svm_set_nested_state() is called before svm_vcpu_run().
> >> The asid is assigned by svm_vcpu_run -> pre_svm_run for the current
> >> vmcb which is now vmcb02 as we are in nested mode subsequently
> >> vmcb01.control.asid is never set as it should be.
> >>
> >> Tested:
> >> kvm-unit-tests
> >> kvm self tests
> >>
> >> Signed-off-by: Cathy Avery 
> >> ---
> >>   arch/x86/kvm/svm/nested.c | 116 ++
> >>   arch/x86/kvm/svm/svm.c|  41 +++---
> >>   arch/x86/kvm/svm/svm.h|  10 ++--
> >>   3 files changed, 81 insertions(+), 86 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> >> index
> >> e90bc436f584..0a06e62010d8 100644
> >> --- a/arch/x86/kvm/svm/nested.c
> >> +++ b/arch/x86/kvm/svm/nested.c
> >> @@ -75,12 +75,12 @@ static unsigned long
> >> nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)  static void
> >> nested_svm_init_mmu_context(struct kvm_vcpu
> >> *vcpu)  {
> >>struct vcpu_svm *svm = to_svm(vcpu);
> >> -  struct vmcb *hsave = svm->nested.hsave;
> >>
> >>WARN_ON(mmu_is_nested(vcpu));
> >>
> >>vcpu->arch.mmu = >arch.guest_mmu;
> >> -  kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, hsave->save.cr4,
> >> hsave->save.efer,
> >> +  kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01-
> >>> save.cr4,
> >> +  svm->vmcb01->save.efer,
> >>svm->nested.ctl.nested_cr3);
> >>vcpu->arch.mmu->get_guest_pgd = nested_svm_get_tdp_cr3;
> >>vcpu->arch.mmu->get_pdptr = nested_svm_get_tdp_pdptr;
> >> @@ -105,7 +105,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
> >>return;
> >>
> >>c = >vmcb->control;
> >> -  h = >nested.hsave->control;
> >> +  h = >vmcb01->control;
> >>g = >nested.ctl;
> >>
> >>svm->nested.host_intercept_exceptions = h->intercept_exceptions;
> >> @@ -403,7 +403,7 @@ static void nested_prepare_vmcb_control(struct
> >> vcpu_svm *svm)
> >>
> >>svm->vmcb->control.int_ctl =
> >>(svm->nested.ctl.int_ctl & ~mask) |
> >> -  (svm->nested.hsave->control.int_ctl & mask);
> >> +  (svm->vmcb01->control.int_ctl & mask);
> >>
> >>svm->vmcb->control.virt_ext= svm->nested.ctl.virt_ext;
> >>svm->vmcb->control.int_vector  = svm->nested.ctl.int_vector;
> >> @@ -432,6 +432,12 @@ int enter_svm_guest_mode(struct vcpu_svm *svm,
> >> u64 vmcb_gpa,
> >>int ret;
> >>
> >>svm->neste

RE: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest

2020-09-18 Thread Babu Moger
Cathy,
Thanks for the patches. It cleans up the code nicely.
But there are some issues with the patch. I was able to bring the L1 guest
with your patch. But when I tried to load L2 guest it crashed. I am
thinking It is mostly due to save/restore part of vmcb. Few comments below.

> -Original Message-
> From: Cathy Avery 
> Sent: Thursday, September 17, 2020 2:23 PM
> To: linux-kernel@vger.kernel.org; k...@vger.kernel.org; pbonz...@redhat.com
> Cc: vkuzn...@redhat.com; Huang2, Wei 
> Subject: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest
> 
> svm->vmcb will now point to either a separate vmcb L1 ( not nested ) or L2 
> vmcb
> ( nested ).
> 
> Issues:
> 
> 1) There is some wholesale copying of vmcb.save and vmcb.contol
>areas which will need to be refined.
> 
> 2) There is a workaround in nested_svm_vmexit() where
> 
>if (svm->vmcb01->control.asid == 0)
>svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
> 
>This was done as a result of the kvm selftest 'state_test'. In that
>test svm_set_nested_state() is called before svm_vcpu_run().
>The asid is assigned by svm_vcpu_run -> pre_svm_run for the current
>vmcb which is now vmcb02 as we are in nested mode subsequently
>vmcb01.control.asid is never set as it should be.
> 
> Tested:
> kvm-unit-tests
> kvm self tests
> 
> Signed-off-by: Cathy Avery 
> ---
>  arch/x86/kvm/svm/nested.c | 116 ++
>  arch/x86/kvm/svm/svm.c|  41 +++---
>  arch/x86/kvm/svm/svm.h|  10 ++--
>  3 files changed, 81 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index
> e90bc436f584..0a06e62010d8 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -75,12 +75,12 @@ static unsigned long nested_svm_get_tdp_cr3(struct
> kvm_vcpu *vcpu)  static void nested_svm_init_mmu_context(struct kvm_vcpu
> *vcpu)  {
>   struct vcpu_svm *svm = to_svm(vcpu);
> - struct vmcb *hsave = svm->nested.hsave;
> 
>   WARN_ON(mmu_is_nested(vcpu));
> 
>   vcpu->arch.mmu = >arch.guest_mmu;
> - kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, hsave->save.cr4,
> hsave->save.efer,
> + kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01-
> >save.cr4,
> + svm->vmcb01->save.efer,
>   svm->nested.ctl.nested_cr3);
>   vcpu->arch.mmu->get_guest_pgd = nested_svm_get_tdp_cr3;
>   vcpu->arch.mmu->get_pdptr = nested_svm_get_tdp_pdptr;
> @@ -105,7 +105,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
>   return;
> 
>   c = >vmcb->control;
> - h = >nested.hsave->control;
> + h = >vmcb01->control;
>   g = >nested.ctl;
> 
>   svm->nested.host_intercept_exceptions = h->intercept_exceptions;
> @@ -403,7 +403,7 @@ static void nested_prepare_vmcb_control(struct
> vcpu_svm *svm)
> 
>   svm->vmcb->control.int_ctl =
>   (svm->nested.ctl.int_ctl & ~mask) |
> - (svm->nested.hsave->control.int_ctl & mask);
> + (svm->vmcb01->control.int_ctl & mask);
> 
>   svm->vmcb->control.virt_ext= svm->nested.ctl.virt_ext;
>   svm->vmcb->control.int_vector  = svm->nested.ctl.int_vector;
> @@ -432,6 +432,12 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64
> vmcb_gpa,
>   int ret;
> 
>   svm->nested.vmcb = vmcb_gpa;
> +
> + WARN_ON(svm->vmcb == svm->nested.vmcb02);
> +
> + svm->nested.vmcb02->control = svm->vmcb01->control;
> + svm->vmcb = svm->nested.vmcb02;
> + svm->vmcb_pa = svm->nested.vmcb02_pa;
>   load_nested_vmcb_control(svm, _vmcb->control);
>   nested_prepare_vmcb_save(svm, nested_vmcb);
>   nested_prepare_vmcb_control(svm);
> @@ -450,8 +456,6 @@ int nested_svm_vmrun(struct vcpu_svm *svm)  {
>   int ret;
>   struct vmcb *nested_vmcb;
> - struct vmcb *hsave = svm->nested.hsave;
> - struct vmcb *vmcb = svm->vmcb;
>   struct kvm_host_map map;
>   u64 vmcb_gpa;
> 
> @@ -496,29 +500,17 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
>   kvm_clear_exception_queue(>vcpu);
>   kvm_clear_interrupt_queue(>vcpu);
> 
> - /*
> -  * Save the old vmcb, so we don't need to pick what we save, but can
> -  * restore everything when a VMEXIT occurs
> -  */
> - hsave->save.es = vmcb->save.es;
> - hsave->save.cs = vmcb->save.cs;
> - hsave->save.ss = vmcb->save.ss;
> - hsave->save.ds = vmcb->save.ds;
> - hsave->save.gdtr   = vmcb->save.gdtr;
> - hsave->save.idtr   = vmcb->save.idtr;
> - hsave->save.efer   = svm->vcpu.arch.efer;
> - hsave->save.cr0= kvm_read_cr0(>vcpu);
> - hsave->save.cr4= svm->vcpu.arch.cr4;
> - hsave->save.rflags = kvm_get_rflags(>vcpu);
> - hsave->save.rip= kvm_rip_read(>vcpu);
> - hsave->save.rsp= vmcb->save.rsp;
> - hsave->save.rax= vmcb->save.rax;
> - if (npt_enabled)
> 

Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support

2020-09-14 Thread Babu Moger



On 9/12/20 12:08 PM, Paolo Bonzini wrote:
> On 11/09/20 21:27, Babu Moger wrote:
>> The following series adds the support for PCID/INVPCID on AMD guests.
>> While doing it re-structured the vmcb_control_area data structure to
>> combine all the intercept vectors into one 32 bit array. Makes it easy
>> for future additions. Re-arranged few pcid related code to make it common
>> between SVM and VMX.
>>
>> INVPCID interceptions are added only when the guest is running with shadow
>> page table enabled. In this case the hypervisor needs to handle the tlbflush
>> based on the type of invpcid instruction.
>>
>> For the guests with nested page table (NPT) support, the INVPCID feature
>> works as running it natively. KVM does not need to do any special handling.
>>
>> AMD documentation for INVPCID feature is available at "AMD64 Architecture
>> Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or 
>> later)"
>>
>> The documentation can be obtained at the links below:
>> Link: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdfdata=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116sdata=C3EGywJcz3rAPmjckWGKbm7GkHR1Xyrl%2BIL9sEijhcQ%3Dreserved=0
>> Link: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116sdata=29n8WNNpcUgVQRUyxbiSPcWJGTL5uV%2FaHgHXU1b9BjI%3Dreserved=0
>> ---
>>
>> v6:
>>  One minor change in patch #04. Otherwise same as v5.
>>  Updated all the patches by Reviewed-by.
>>
>> v5:
>>  
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159846887637.18873.14677728679411578606.stgit%40bmoger-ubuntu%2Fdata=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116sdata=D7HvBj6OArmpKsiaZj0Qk3mIHWYOOUN23f53ajhQpOY%3Dreserved=0
>>  All the changes are related to rebase.
>>  Aplies cleanly on mainline and kvm(master) tree. 
>>  Resending it to get some attention.
>>
>> v4:
>>  
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159676101387.12805.18038347880482984693.stgit%40bmoger-ubuntu%2Fdata=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116sdata=7og620g0qsxee7Wd60emz5YdbA44Al4tiUJX5n46MhE%3Dreserved=0
>>  1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
>> to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
>> vmcb_control_area structure(Suggested by Paolo).
>>  2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle 
>> kvm_read/write_guest_virt*())
>> to make it common across both SVM/VMX(Suggested by Jim Mattson).
>>  3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
>> on few patches which I have changed since v3.
>>
>> v3:
>>  
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159597929496.12744.14654593948763926416.stgit%40bmoger-ubuntu%2Fdata=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116sdata=hvPNH827bmo1VL%2F%2FIv%2F%2ByQdVBygOpI1tkgQ6ASf5Wt8%3Dreserved=0
>>  1. Addressing the comments from Jim Mattson. Follow the v2 link below
>> for the context.
>>  2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
>> using native __set_bit, clear_bit and test_bit.
>>  3. Combined all the intercepts vectors into single 32 bit array.
>>  4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
>> clr_exception_intercept etc. Used the generic set_intercept and
>> clr_intercept where applicable.
>>  5. Tested both L1 guest and l2 nested guests. 
>>
>> v2:
>>   
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159234483706.6230.13753828995249423191.stgit%40bmoger-ubuntu%2Fdata=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116sdata=rP%2BlRJ91tk1VXS3YX8TdP2L9vORiIj8gN3ZZLKIXfeY%3Dreserved=0
>>   - Taken care of few comments from Jim Mattson.
>>   - KVM interceptions added only when tdp is off. No in

[PATCH v6 02/12] KVM: SVM: Change intercept_cr to generic intercepts

2020-09-11 Thread Babu Moger
Change intercept_cr to generic intercepts in vmcb_control_area.
Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
where applicable.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |   42 --
 arch/x86/kvm/svm/nested.c  |   26 +-
 arch/x86/kvm/svm/svm.c |4 ++--
 arch/x86/kvm/svm/svm.h |   12 ++--
 4 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 8a1f5382a4ea..d4739f4eae63 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -4,6 +4,37 @@
 
 #include 
 
+/*
+ * VMCB Control Area intercept bits starting
+ * at Byte offset 000h (Vector 0).
+ */
+
+enum vector_offset {
+   CR_VECTOR = 0,
+   MAX_VECTORS,
+};
+
+enum {
+   /* Byte offset 000h (Vector 0) */
+   INTERCEPT_CR0_READ = 0,
+   INTERCEPT_CR1_READ,
+   INTERCEPT_CR2_READ,
+   INTERCEPT_CR3_READ,
+   INTERCEPT_CR4_READ,
+   INTERCEPT_CR5_READ,
+   INTERCEPT_CR6_READ,
+   INTERCEPT_CR7_READ,
+   INTERCEPT_CR8_READ,
+   INTERCEPT_CR0_WRITE = 16,
+   INTERCEPT_CR1_WRITE,
+   INTERCEPT_CR2_WRITE,
+   INTERCEPT_CR3_WRITE,
+   INTERCEPT_CR4_WRITE,
+   INTERCEPT_CR5_WRITE,
+   INTERCEPT_CR6_WRITE,
+   INTERCEPT_CR7_WRITE,
+   INTERCEPT_CR8_WRITE,
+};
 
 enum {
INTERCEPT_INTR,
@@ -57,7 +88,7 @@ enum {
 
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
-   u32 intercept_cr;
+   u32 intercepts[MAX_VECTORS];
u32 intercept_dr;
u32 intercept_exceptions;
u64 intercept;
@@ -240,15 +271,6 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK
 #define SVM_SELECTOR_CODE_MASK (1 << 3)
 
-#define INTERCEPT_CR0_READ 0
-#define INTERCEPT_CR3_READ 3
-#define INTERCEPT_CR4_READ 4
-#define INTERCEPT_CR8_READ 8
-#define INTERCEPT_CR0_WRITE(16 + 0)
-#define INTERCEPT_CR3_WRITE(16 + 3)
-#define INTERCEPT_CR4_WRITE(16 + 4)
-#define INTERCEPT_CR8_WRITE(16 + 8)
-
 #define INTERCEPT_DR0_READ 0
 #define INTERCEPT_DR1_READ 1
 #define INTERCEPT_DR2_READ 2
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fb68467e6049..5f65b759abcb 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -98,6 +98,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu 
*vcpu)
 void recalc_intercepts(struct vcpu_svm *svm)
 {
struct vmcb_control_area *c, *h, *g;
+   unsigned int i;
 
vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 
@@ -110,15 +111,17 @@ void recalc_intercepts(struct vcpu_svm *svm)
 
svm->nested.host_intercept_exceptions = h->intercept_exceptions;
 
-   c->intercept_cr = h->intercept_cr;
+   for (i = 0; i < MAX_VECTORS; i++)
+   c->intercepts[i] = h->intercepts[i];
+
c->intercept_dr = h->intercept_dr;
c->intercept_exceptions = h->intercept_exceptions;
c->intercept = h->intercept;
 
if (g->int_ctl & V_INTR_MASKING_MASK) {
/* We only want the cr8 intercept bits of L1 */
-   c->intercept_cr &= ~(1U << INTERCEPT_CR8_READ);
-   c->intercept_cr &= ~(1U << INTERCEPT_CR8_WRITE);
+   vmcb_clr_intercept(c, INTERCEPT_CR8_READ);
+   vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
 
/*
 * Once running L2 with HF_VINTR_MASK, EFLAGS.IF does not
@@ -131,7 +134,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
/* We don't want to see VMMCALLs from a nested guest */
c->intercept &= ~(1ULL << INTERCEPT_VMMCALL);
 
-   c->intercept_cr |= g->intercept_cr;
+   for (i = 0; i < MAX_VECTORS; i++)
+   c->intercepts[i] |= g->intercepts[i];
+
c->intercept_dr |= g->intercept_dr;
c->intercept_exceptions |= g->intercept_exceptions;
c->intercept |= g->intercept;
@@ -140,7 +145,11 @@ void recalc_intercepts(struct vcpu_svm *svm)
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
   struct vmcb_control_area *from)
 {
-   dst->intercept_cr = from->intercept_cr;
+   unsigned int i;
+
+   for (i = 0; i < MAX_VECTORS; i++)
+   dst->intercepts[i] = from->intercepts[i];
+
dst->intercept_dr = from->intercept_dr;
dst->intercept_exceptions = from->intercept_exceptions;
dst->intercept= from->intercept;
@@ -487,8 +496,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
   nested_vmcb->control.event_inj,
   nested_vmcb->control.nest

[PATCH v6 01/12] KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)

2020-09-11 Thread Babu Moger
This is in preparation for the future intercept vector additions.

Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
using kernel APIs __set_bit, __clear_bit and test_bit espectively.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/svm/svm.h |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a798e1731709..1cff7644e70b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -214,6 +214,21 @@ static inline struct vmcb *get_host_vmcb(struct vcpu_svm 
*svm)
return svm->vmcb;
 }
 
+static inline void vmcb_set_intercept(struct vmcb_control_area *control, int 
bit)
+{
+   __set_bit(bit, (unsigned long *)>intercept_cr);
+}
+
+static inline void vmcb_clr_intercept(struct vmcb_control_area *control, int 
bit)
+{
+   __clear_bit(bit, (unsigned long *)>intercept_cr);
+}
+
+static inline bool vmcb_is_intercept(struct vmcb_control_area *control, int 
bit)
+{
+   return test_bit(bit, (unsigned long *)>intercept_cr);
+}
+
 static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
 {
struct vmcb *vmcb = get_host_vmcb(svm);



[PATCH v6 00/12] SVM cleanup and INVPCID feature support

2020-09-11 Thread Babu Moger
The following series adds the support for PCID/INVPCID on AMD guests.
While doing it re-structured the vmcb_control_area data structure to
combine all the intercept vectors into one 32 bit array. Makes it easy
for future additions. Re-arranged few pcid related code to make it common
between SVM and VMX.

INVPCID interceptions are added only when the guest is running with shadow
page table enabled. In this case the hypervisor needs to handle the tlbflush
based on the type of invpcid instruction.

For the guests with nested page table (NPT) support, the INVPCID feature
works as running it natively. KVM does not need to do any special handling.

AMD documentation for INVPCID feature is available at "AMD64 Architecture
Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or 
later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---

v6:
 One minor change in patch #04. Otherwise same as v5.
 Updated all the patches by Reviewed-by.

v5:
 
https://lore.kernel.org/lkml/159846887637.18873.14677728679411578606.stgit@bmoger-ubuntu/
 All the changes are related to rebase.
 Aplies cleanly on mainline and kvm(master) tree. 
 Resending it to get some attention.

v4:
 
https://lore.kernel.org/lkml/159676101387.12805.18038347880482984693.stgit@bmoger-ubuntu/
 1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
vmcb_control_area structure(Suggested by Paolo).
 2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle 
kvm_read/write_guest_virt*())
to make it common across both SVM/VMX(Suggested by Jim Mattson).
 3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
on few patches which I have changed since v3.

v3:
 
https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgit@bmoger-ubuntu/
 1. Addressing the comments from Jim Mattson. Follow the v2 link below
for the context.
 2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
using native __set_bit, clear_bit and test_bit.
 3. Combined all the intercepts vectors into single 32 bit array.
 4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
clr_exception_intercept etc. Used the generic set_intercept and
clr_intercept where applicable.
 5. Tested both L1 guest and l2 nested guests. 

v2:
  
https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit@bmoger-ubuntu/
  - Taken care of few comments from Jim Mattson.
  - KVM interceptions added only when tdp is off. No interceptions
when tdp is on.
  - Reverted the fault priority to original order in VMX. 
  
v1:
  
https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgit@bmoger-ubuntu/

Babu Moger (12):
  KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
  KVM: SVM: Change intercept_cr to generic intercepts
  KVM: SVM: Change intercept_dr to generic intercepts
  KVM: SVM: Modify intercept_exceptions to generic intercepts
  KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
  KVM: SVM: Add new intercept vector in vmcb_control_area
  KVM: nSVM: Cleanup nested_state data structure
  KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
  KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
  KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
  KVM: X86: Move handling of INVPCID types to x86
  KVM:SVM: Enable INVPCID feature on AMD


 arch/x86/include/asm/svm.h  |  117 +--
 arch/x86/include/uapi/asm/svm.h |2 +
 arch/x86/kvm/svm/nested.c   |   66 +---
 arch/x86/kvm/svm/svm.c  |  131 ++-
 arch/x86/kvm/svm/svm.h  |   87 +-
 arch/x86/kvm/trace.h|   21 --
 arch/x86/kvm/vmx/nested.c   |   12 ++--
 arch/x86/kvm/vmx/vmx.c  |   95 
 arch/x86/kvm/vmx/vmx.h  |2 -
 arch/x86/kvm/x86.c  |  106 
 arch/x86/kvm/x86.h  |3 +
 11 files changed, 364 insertions(+), 278 deletions(-)

--
Signature


[PATCH v6 12/12] KVM:SVM: Enable INVPCID feature on AMD

2020-09-11 Thread Babu Moger
The following intercept bit has been added to support VMEXIT
for INVPCID instruction:
CodeNameCause
A2h VMEXIT_INVPCID  INVPCID instruction

The following bit has been added to the VMCB layout control area
to control intercept of INVPCID:
Byte Offset Bit(s)Function
14h 2 intercept INVPCID

Enable the interceptions when the the guest is running with shadow
page table enabled and handle the tlbflush based on the invpcid
instruction type.

For the guests with nested page table (NPT) support, the INVPCID
feature works as running it natively. KVM does not need to do any
special handling in this case.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/uapi/asm/svm.h |2 ++
 arch/x86/kvm/svm/svm.c  |   51 +++
 2 files changed, 53 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 2e8a30f06c74..522d42dfc28c 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -76,6 +76,7 @@
 #define SVM_EXIT_MWAIT_COND0x08c
 #define SVM_EXIT_XSETBV0x08d
 #define SVM_EXIT_RDPRU 0x08e
+#define SVM_EXIT_INVPCID   0x0a2
 #define SVM_EXIT_NPF   0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI   0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402
@@ -171,6 +172,7 @@
{ SVM_EXIT_MONITOR, "monitor" }, \
{ SVM_EXIT_MWAIT,   "mwait" }, \
{ SVM_EXIT_XSETBV,  "xsetbv" }, \
+   { SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 96617b61e531..5c6b8d0f7628 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -813,6 +813,9 @@ static __init void svm_set_cpu_caps(void)
if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
boot_cpu_has(X86_FEATURE_AMD_SSBD))
kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+
+   /* Enable INVPCID feature */
+   kvm_cpu_cap_check_and_set(X86_FEATURE_INVPCID);
 }
 
 static __init int svm_hardware_setup(void)
@@ -985,6 +988,21 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, 
u64 offset)
return svm->vmcb->control.tsc_offset;
 }
 
+static void svm_check_invpcid(struct vcpu_svm *svm)
+{
+   /*
+* Intercept INVPCID instruction only if shadow page table is
+* enabled. Interception is not required with nested page table
+* enabled.
+*/
+   if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
+   if (!npt_enabled)
+   svm_set_intercept(svm, INTERCEPT_INVPCID);
+   else
+   svm_clr_intercept(svm, INTERCEPT_INVPCID);
+   }
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
struct vmcb_control_area *control = >vmcb->control;
@@ -1114,6 +1132,8 @@ static void init_vmcb(struct vcpu_svm *svm)
svm_clr_intercept(svm, INTERCEPT_PAUSE);
}
 
+   svm_check_invpcid(svm);
+
if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
@@ -2730,6 +2750,33 @@ static int mwait_interception(struct vcpu_svm *svm)
return nop_interception(svm);
 }
 
+static int invpcid_interception(struct vcpu_svm *svm)
+{
+   struct kvm_vcpu *vcpu = >vcpu;
+   unsigned long type;
+   gva_t gva;
+
+   if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   /*
+* For an INVPCID intercept:
+* EXITINFO1 provides the linear address of the memory operand.
+* EXITINFO2 provides the contents of the register operand.
+*/
+   type = svm->vmcb->control.exit_info_2;
+   gva = svm->vmcb->control.exit_info_1;
+
+   if (type > 3) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   return kvm_handle_invpcid(vcpu, type, gva);
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = cr_interception,
[SVM_EXIT_READ_CR3] = cr_interception,
@@ -2792,6 +2839,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm 
*svm) = {
[SVM_EXIT_MWAIT]= mwait_interception,
[SVM_EXIT_XSETBV] 

[PATCH v6 03/12] KVM: SVM: Change intercept_dr to generic intercepts

2020-09-11 Thread Babu Moger
Modify intercept_dr to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
to set/clear/test the intercept_dr bits.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |   36 ++--
 arch/x86/kvm/svm/nested.c  |6 +-
 arch/x86/kvm/svm/svm.c |4 ++--
 arch/x86/kvm/svm/svm.h |   34 +-
 4 files changed, 38 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index d4739f4eae63..ffc89d8e4fcb 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -11,6 +11,7 @@
 
 enum vector_offset {
CR_VECTOR = 0,
+   DR_VECTOR,
MAX_VECTORS,
 };
 
@@ -34,6 +35,23 @@ enum {
INTERCEPT_CR6_WRITE,
INTERCEPT_CR7_WRITE,
INTERCEPT_CR8_WRITE,
+   /* Byte offset 004h (Vector 1) */
+   INTERCEPT_DR0_READ = 32,
+   INTERCEPT_DR1_READ,
+   INTERCEPT_DR2_READ,
+   INTERCEPT_DR3_READ,
+   INTERCEPT_DR4_READ,
+   INTERCEPT_DR5_READ,
+   INTERCEPT_DR6_READ,
+   INTERCEPT_DR7_READ,
+   INTERCEPT_DR0_WRITE = 48,
+   INTERCEPT_DR1_WRITE,
+   INTERCEPT_DR2_WRITE,
+   INTERCEPT_DR3_WRITE,
+   INTERCEPT_DR4_WRITE,
+   INTERCEPT_DR5_WRITE,
+   INTERCEPT_DR6_WRITE,
+   INTERCEPT_DR7_WRITE,
 };
 
 enum {
@@ -89,7 +107,6 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercepts[MAX_VECTORS];
-   u32 intercept_dr;
u32 intercept_exceptions;
u64 intercept;
u8 reserved_1[40];
@@ -271,23 +288,6 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK
 #define SVM_SELECTOR_CODE_MASK (1 << 3)
 
-#define INTERCEPT_DR0_READ 0
-#define INTERCEPT_DR1_READ 1
-#define INTERCEPT_DR2_READ 2
-#define INTERCEPT_DR3_READ 3
-#define INTERCEPT_DR4_READ 4
-#define INTERCEPT_DR5_READ 5
-#define INTERCEPT_DR6_READ 6
-#define INTERCEPT_DR7_READ 7
-#define INTERCEPT_DR0_WRITE(16 + 0)
-#define INTERCEPT_DR1_WRITE(16 + 1)
-#define INTERCEPT_DR2_WRITE(16 + 2)
-#define INTERCEPT_DR3_WRITE(16 + 3)
-#define INTERCEPT_DR4_WRITE(16 + 4)
-#define INTERCEPT_DR5_WRITE(16 + 5)
-#define INTERCEPT_DR6_WRITE(16 + 6)
-#define INTERCEPT_DR7_WRITE(16 + 7)
-
 #define SVM_EVTINJ_VEC_MASK 0xff
 
 #define SVM_EVTINJ_TYPE_SHIFT 8
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5f65b759abcb..ba11fc3bf843 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -114,7 +114,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
-   c->intercept_dr = h->intercept_dr;
c->intercept_exceptions = h->intercept_exceptions;
c->intercept = h->intercept;
 
@@ -137,7 +136,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] |= g->intercepts[i];
 
-   c->intercept_dr |= g->intercept_dr;
c->intercept_exceptions |= g->intercept_exceptions;
c->intercept |= g->intercept;
 }
@@ -150,7 +148,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area 
*dst,
for (i = 0; i < MAX_VECTORS; i++)
dst->intercepts[i] = from->intercepts[i];
 
-   dst->intercept_dr = from->intercept_dr;
dst->intercept_exceptions = from->intercept_exceptions;
dst->intercept= from->intercept;
dst->iopm_base_pa = from->iopm_base_pa;
@@ -779,8 +776,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
break;
}
case SVM_EXIT_READ_DR0 ... SVM_EXIT_WRITE_DR7: {
-   u32 bit = 1U << (exit_code - SVM_EXIT_READ_DR0);
-   if (svm->nested.ctl.intercept_dr & bit)
+   if (vmcb_is_intercept(>nested.ctl, exit_code))
vmexit = NESTED_EXIT_DONE;
break;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 523936b80dda..1a5f3908b388 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2815,8 +2815,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("VMCB Control Area:\n");
pr_err("%-20s%04x\n", "cr_read:", control->intercepts[CR_VECTOR] & 
0x);
pr_err("%-20s%04x\n", "cr_write:", control->intercepts[CR_VECTOR] >> 
16);
-   pr_err("%-20s%04x\n", "dr_read:", control->intercept_dr & 0x);
-   pr_err("%-20s%04x\n", "dr_write:", control->intercept_dr >> 16);
+   pr_err("%-20s%04x\n", &quo

[PATCH v6 07/12] KVM: nSVM: Cleanup nested_state data structure

2020-09-11 Thread Babu Moger
host_intercept_exceptions is not used anywhere. Clean it up.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/svm/nested.c |2 --
 arch/x86/kvm/svm/svm.h|1 -
 2 files changed, 3 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c833f6265b59..7121756685ee 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -109,8 +109,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
h = >nested.hsave->control;
g = >nested.ctl;
 
-   svm->nested.host_intercept_exceptions = h->intercepts[EXCEPTION_VECTOR];
-
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2cde5091775a..ffb35a83048f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -86,7 +86,6 @@ struct svm_nested_state {
u64 hsave_msr;
u64 vm_cr_msr;
u64 vmcb;
-   u32 host_intercept_exceptions;
 
/* These are the merged vectors */
u32 *msrpm;



[PATCH v6 08/12] KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept

2020-09-11 Thread Babu Moger
Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead
call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep
tfor all cr intercepts.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/svm/svm.c |   34 +-
 arch/x86/kvm/svm/svm.h |   25 -
 2 files changed, 17 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 17bfa34033ac..0d7397f4a4f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -992,14 +992,14 @@ static void init_vmcb(struct vcpu_svm *svm)
 
svm->vcpu.arch.hflags = 0;
 
-   set_cr_intercept(svm, INTERCEPT_CR0_READ);
-   set_cr_intercept(svm, INTERCEPT_CR3_READ);
-   set_cr_intercept(svm, INTERCEPT_CR4_READ);
-   set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
-   set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
-   set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR0_READ);
+   svm_set_intercept(svm, INTERCEPT_CR3_READ);
+   svm_set_intercept(svm, INTERCEPT_CR4_READ);
+   svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
if (!kvm_vcpu_apicv_active(>vcpu))
-   set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 
set_dr_intercepts(svm);
 
@@ -1094,8 +1094,8 @@ static void init_vmcb(struct vcpu_svm *svm)
control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
svm_clr_intercept(svm, INTERCEPT_INVLPG);
clr_exception_intercept(svm, INTERCEPT_PF_VECTOR);
-   clr_cr_intercept(svm, INTERCEPT_CR3_READ);
-   clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
+   svm_clr_intercept(svm, INTERCEPT_CR3_READ);
+   svm_clr_intercept(svm, INTERCEPT_CR3_WRITE);
save->g_pat = svm->vcpu.arch.pat;
save->cr3 = 0;
save->cr4 = 0;
@@ -1549,11 +1549,11 @@ static void update_cr0_intercept(struct vcpu_svm *svm)
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 
if (gcr0 == *hcr0) {
-   clr_cr_intercept(svm, INTERCEPT_CR0_READ);
-   clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
+   svm_clr_intercept(svm, INTERCEPT_CR0_READ);
+   svm_clr_intercept(svm, INTERCEPT_CR0_WRITE);
} else {
-   set_cr_intercept(svm, INTERCEPT_CR0_READ);
-   set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR0_READ);
+   svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
}
 }
 
@@ -2931,7 +2931,7 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t 
exit_fastpath)
 
trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
 
-   if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
+   if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
vcpu->arch.cr0 = svm->vmcb->save.cr0;
if (npt_enabled)
vcpu->arch.cr3 = svm->vmcb->save.cr3;
@@ -3056,13 +3056,13 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
if (nested_svm_virtualize_tpr(vcpu))
return;
 
-   clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
 
if (irr == -1)
return;
 
if (tpr >= irr)
-   set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 }
 
 bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
@@ -3250,7 +3250,7 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu 
*vcpu)
if (nested_svm_virtualize_tpr(vcpu))
return;
 
-   if (!is_cr_intercept(svm, INTERCEPT_CR8_WRITE)) {
+   if (!svm_is_intercept(svm, INTERCEPT_CR8_WRITE)) {
int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK;
kvm_set_cr8(vcpu, cr8);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ffb35a83048f..8128bac75fa2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -228,31 +228,6 @@ static inline bool vmcb_is_intercept(struct 
vmcb_control_area *control, int bit)
return test_bit(bit, (unsigned long *)>intercepts);
 }
 
-static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_set_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
-static inline void clr_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_clr_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
-static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   r

[PATCH v6 06/12] KVM: SVM: Add new intercept vector in vmcb_control_area

2020-09-11 Thread Babu Moger
The new intercept bits have been added in vmcb control area to support
few more interceptions. Here are the some of them.
 - INTERCEPT_INVLPGB,
 - INTERCEPT_INVLPGB_ILLEGAL,
 - INTERCEPT_INVPCID,
 - INTERCEPT_MCOMMIT,
 - INTERCEPT_TLBSYNC,

Add new intercept vector in vmcb_control_area to support these instructions.
Also update kvm_nested_vmrun trace function to support the new addition.

AMD documentation for these instructions is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593
Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |7 +++
 arch/x86/kvm/svm/nested.c  |3 ++-
 arch/x86/kvm/trace.h   |   13 -
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 9f0fa02fc838..623c392a55ac 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -16,6 +16,7 @@ enum vector_offset {
EXCEPTION_VECTOR,
INTERCEPT_VECTOR_3,
INTERCEPT_VECTOR_4,
+   INTERCEPT_VECTOR_5,
MAX_VECTORS,
 };
 
@@ -124,6 +125,12 @@ enum {
INTERCEPT_MWAIT_COND,
INTERCEPT_XSETBV,
INTERCEPT_RDPRU,
+   /* Byte offset 014h (Vector 5) */
+   INTERCEPT_INVLPGB = 160,
+   INTERCEPT_INVLPGB_ILLEGAL,
+   INTERCEPT_INVPCID,
+   INTERCEPT_MCOMMIT,
+   INTERCEPT_TLBSYNC,
 };
 
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 353d550a5bb7..c833f6265b59 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -489,7 +489,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
nested_vmcb->control.intercepts[CR_VECTOR] 
>> 16,

nested_vmcb->control.intercepts[EXCEPTION_VECTOR],

nested_vmcb->control.intercepts[INTERCEPT_VECTOR_3],
-   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4]);
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4],
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_5]);
 
/* Clear internal status */
kvm_clear_exception_queue(>vcpu);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 6e7262229e6a..11046171b5d9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -544,9 +544,10 @@ TRACE_EVENT(kvm_nested_vmrun,
 );
 
 TRACE_EVENT(kvm_nested_intercepts,
-   TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u32 
intercept1,
-__u32 intercept2),
-   TP_ARGS(cr_read, cr_write, exceptions, intercept1, intercept2),
+   TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
+__u32 intercept1, __u32 intercept2, __u32 intercept3),
+   TP_ARGS(cr_read, cr_write, exceptions, intercept1,
+   intercept2, intercept3),
 
TP_STRUCT__entry(
__field(__u16,  cr_read )
@@ -554,6 +555,7 @@ TRACE_EVENT(kvm_nested_intercepts,
__field(__u32,  exceptions  )
__field(__u32,  intercept1  )
__field(__u32,  intercept2  )
+   __field(__u32,  intercept3  )
),
 
TP_fast_assign(
@@ -562,12 +564,13 @@ TRACE_EVENT(kvm_nested_intercepts,
__entry->exceptions = exceptions;
__entry->intercept1 = intercept1;
__entry->intercept2 = intercept2;
+   __entry->intercept3 = intercept3;
),
 
TP_printk("cr_read: %04x cr_write: %04x excp: %08x "
- "intercept1: %08x intercept2: %08x",
+ "intercept1: %08x intercept2: %08x  intercept3: %08x",
  __entry->cr_read, __entry->cr_write, __entry->exceptions,
- __entry->intercept1, __entry->intercept2)
+ __entry->intercept1, __entry->intercept2, __entry->intercept3)
 );
 /*
  * Tracepoint for #VMEXIT while nested



[PATCH v6 11/12] KVM: X86: Move handling of INVPCID types to x86

2020-09-11 Thread Babu Moger
INVPCID instruction handling is mostly same across both VMX and
SVM. So, move the code to common x86.c.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/vmx/vmx.c |   68 +-
 arch/x86/kvm/x86.c |   78 
 arch/x86/kvm/x86.h |1 +
 3 files changed, 80 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b15b4c6e3b46..ff42d27f641f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5497,16 +5497,11 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 {
u32 vmx_instruction_info;
unsigned long type;
-   bool pcid_enabled;
gva_t gva;
-   struct x86_exception e;
-   unsigned i;
-   unsigned long roots_to_free = 0;
struct {
u64 pcid;
u64 gla;
} operand;
-   int r;
 
if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
kvm_queue_exception(vcpu, UD_VECTOR);
@@ -5529,68 +5524,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
sizeof(operand), ))
return 1;
 
-   r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
-   if (r != X86EMUL_CONTINUE)
-   return kvm_handle_memory_failure(vcpu, r, );
-
-   if (operand.pcid >> 12 != 0) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
-
-   pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
-
-   switch (type) {
-   case INVPCID_TYPE_INDIV_ADDR:
-   if ((!pcid_enabled && (operand.pcid != 0)) ||
-   is_noncanonical_address(operand.gla, vcpu)) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
-   kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
-   return kvm_skip_emulated_instruction(vcpu);
-
-   case INVPCID_TYPE_SINGLE_CTXT:
-   if (!pcid_enabled && (operand.pcid != 0)) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
-
-   if (kvm_get_active_pcid(vcpu) == operand.pcid) {
-   kvm_mmu_sync_roots(vcpu);
-   kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-   }
-
-   for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-   if (kvm_get_pcid(vcpu, 
vcpu->arch.mmu->prev_roots[i].pgd)
-   == operand.pcid)
-   roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
-
-   kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
-   /*
-* If neither the current cr3 nor any of the prev_roots use the
-* given PCID, then nothing needs to be done here because a
-* resync will happen anyway before switching to any other CR3.
-*/
-
-   return kvm_skip_emulated_instruction(vcpu);
-
-   case INVPCID_TYPE_ALL_NON_GLOBAL:
-   /*
-* Currently, KVM doesn't mark global entries in the shadow
-* page tables, so a non-global flush just degenerates to a
-* global flush. If needed, we could optimize this later by
-* keeping track of global entries in shadow page tables.
-*/
-
-   /* fall-through */
-   case INVPCID_TYPE_ALL_INCL_GLOBAL:
-   kvm_mmu_unload(vcpu);
-   return kvm_skip_emulated_instruction(vcpu);
-
-   default:
-   BUG(); /* We have already checked above that type <= 3 */
-   }
+   return kvm_handle_invpcid(vcpu, type, gva);
 }
 
 static int handle_pml_full(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d7930ecdddc..39ca22e0f8b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -10791,6 +10792,83 @@ int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, 
int r,
 }
 EXPORT_SYMBOL_GPL(kvm_handle_memory_failure);
 
+int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
+{
+   bool pcid_enabled;
+   struct x86_exception e;
+   unsigned i;
+   unsigned long roots_to_free = 0;
+   struct {
+   u64 pcid;
+   u64 gla;
+   } operand;
+   int r;
+
+   r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
+   if (r != X86EMUL_CONTINUE)
+   return kvm_handle_memory_failure(vcpu, r, );
+
+   if (operand.pcid >> 12 != 0) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+
+   switch (type) {
+   case INVPCID_TYPE_INDIV_ADDR:
+

[PATCH v6 09/12] KVM: SVM: Remove set_exception_intercept and clr_exception_intercept

2020-09-11 Thread Babu Moger
Remove set_exception_intercept and clr_exception_intercept.
Replace with generic set_intercept and clr_intercept for these calls.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/svm/svm.c |   20 ++--
 arch/x86/kvm/svm/svm.h |   18 --
 2 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0d7397f4a4f7..96617b61e531 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1003,11 +1003,11 @@ static void init_vmcb(struct vcpu_svm *svm)
 
set_dr_intercepts(svm);
 
-   set_exception_intercept(svm, INTERCEPT_PF_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_UD_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_MC_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_AC_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_DB_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_PF_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_UD_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_MC_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_AC_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_DB_VECTOR);
/*
 * Guest access to VMware backdoor ports could legitimately
 * trigger #GP because of TSS I/O permission bitmap.
@@ -1015,7 +1015,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 * as VMware does.
 */
if (enable_vmware_backdoor)
-   set_exception_intercept(svm, INTERCEPT_GP_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_GP_VECTOR);
 
svm_set_intercept(svm, INTERCEPT_INTR);
svm_set_intercept(svm, INTERCEPT_NMI);
@@ -1093,7 +1093,7 @@ static void init_vmcb(struct vcpu_svm *svm)
/* Setup VMCB for Nested Paging */
control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
svm_clr_intercept(svm, INTERCEPT_INVLPG);
-   clr_exception_intercept(svm, INTERCEPT_PF_VECTOR);
+   svm_clr_intercept(svm, INTERCEPT_PF_VECTOR);
svm_clr_intercept(svm, INTERCEPT_CR3_READ);
svm_clr_intercept(svm, INTERCEPT_CR3_WRITE);
save->g_pat = svm->vcpu.arch.pat;
@@ -1135,7 +1135,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
if (sev_guest(svm->vcpu.kvm)) {
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
-   clr_exception_intercept(svm, INTERCEPT_UD_VECTOR);
+   svm_clr_intercept(svm, INTERCEPT_UD_VECTOR);
}
 
vmcb_mark_all_dirty(svm->vmcb);
@@ -1646,11 +1646,11 @@ static void update_exception_bitmap(struct kvm_vcpu 
*vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   clr_exception_intercept(svm, INTERCEPT_BP_VECTOR);
+   svm_clr_intercept(svm, INTERCEPT_BP_VECTOR);
 
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
-   set_exception_intercept(svm, INTERCEPT_BP_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_BP_VECTOR);
}
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8128bac75fa2..fc4bfea3f555 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -261,24 +261,6 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)
recalc_intercepts(svm);
 }
 
-static inline void set_exception_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_set_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
-static inline void clr_exception_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_clr_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
 static inline void svm_set_intercept(struct vcpu_svm *svm, int bit)
 {
struct vmcb *vmcb = get_host_vmcb(svm);



[PATCH v6 05/12] KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors

2020-09-11 Thread Babu Moger
Convert all the intercepts to one array of 32 bit vectors in
vmcb_control_area. This makes it easy for future intercept vector
additions. Also update trace functions.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |   14 +++---
 arch/x86/kvm/svm/nested.c  |   25 ++---
 arch/x86/kvm/svm/svm.c |   16 ++--
 arch/x86/kvm/svm/svm.h |   12 ++--
 arch/x86/kvm/trace.h   |   18 +++---
 5 files changed, 40 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 51833a611eba..9f0fa02fc838 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -14,6 +14,8 @@ enum vector_offset {
CR_VECTOR = 0,
DR_VECTOR,
EXCEPTION_VECTOR,
+   INTERCEPT_VECTOR_3,
+   INTERCEPT_VECTOR_4,
MAX_VECTORS,
 };
 
@@ -73,10 +75,8 @@ enum {
INTERCEPT_MC_VECTOR = 64 + MC_VECTOR,
INTERCEPT_XM_VECTOR = 64 + XM_VECTOR,
INTERCEPT_VE_VECTOR = 64 + VE_VECTOR,
-};
-
-enum {
-   INTERCEPT_INTR,
+   /* Byte offset 00Ch (Vector 3) */
+   INTERCEPT_INTR = 96,
INTERCEPT_NMI,
INTERCEPT_SMI,
INTERCEPT_INIT,
@@ -108,7 +108,8 @@ enum {
INTERCEPT_TASK_SWITCH,
INTERCEPT_FERR_FREEZE,
INTERCEPT_SHUTDOWN,
-   INTERCEPT_VMRUN,
+   /* Byte offset 010h (Vector 4) */
+   INTERCEPT_VMRUN = 128,
INTERCEPT_VMMCALL,
INTERCEPT_VMLOAD,
INTERCEPT_VMSAVE,
@@ -128,8 +129,7 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercepts[MAX_VECTORS];
-   u64 intercept;
-   u8 reserved_1[40];
+   u32 reserved_1[15 - MAX_VECTORS];
u16 pause_filter_thresh;
u16 pause_filter_count;
u64 iopm_base_pa;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c161bd38f401..353d550a5bb7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -114,8 +114,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
-   c->intercept = h->intercept;
-
if (g->int_ctl & V_INTR_MASKING_MASK) {
/* We only want the cr8 intercept bits of L1 */
vmcb_clr_intercept(c, INTERCEPT_CR8_READ);
@@ -126,16 +124,14 @@ void recalc_intercepts(struct vcpu_svm *svm)
 * affect any interrupt we may want to inject; therefore,
 * interrupt window vmexits are irrelevant to L0.
 */
-   c->intercept &= ~(1ULL << INTERCEPT_VINTR);
+   vmcb_clr_intercept(c, INTERCEPT_VINTR);
}
 
/* We don't want to see VMMCALLs from a nested guest */
-   c->intercept &= ~(1ULL << INTERCEPT_VMMCALL);
+   vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
 
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] |= g->intercepts[i];
-
-   c->intercept |= g->intercept;
 }
 
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
@@ -146,7 +142,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area 
*dst,
for (i = 0; i < MAX_VECTORS; i++)
dst->intercepts[i] = from->intercepts[i];
 
-   dst->intercept= from->intercept;
dst->iopm_base_pa = from->iopm_base_pa;
dst->msrpm_base_pa= from->msrpm_base_pa;
dst->tsc_offset   = from->tsc_offset;
@@ -179,7 +174,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 */
int i;
 
-   if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+   if (!(vmcb_is_intercept(>nested.ctl, INTERCEPT_MSR_PROT)))
return true;
 
for (i = 0; i < MSRPM_OFFSETS; i++) {
@@ -205,7 +200,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 
 static bool nested_vmcb_check_controls(struct vmcb_control_area *control)
 {
-   if ((control->intercept & (1ULL << INTERCEPT_VMRUN)) == 0)
+   if ((vmcb_is_intercept(control, INTERCEPT_VMRUN)) == 0)
return false;
 
if (control->asid == 0)
@@ -493,7 +488,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
trace_kvm_nested_intercepts(nested_vmcb->control.intercepts[CR_VECTOR] 
& 0x,
nested_vmcb->control.intercepts[CR_VECTOR] 
>> 16,

nested_vmcb->control.intercepts[EXCEPTION_VECTOR],
-   nested_vmcb->control.intercept);
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_3],
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4]);
 
/* Clear internal status */
   

[PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts

2020-09-11 Thread Babu Moger
Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to
set/clear/test the intercept_exceptions bits.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |   22 +-
 arch/x86/kvm/svm/nested.c  |   12 +---
 arch/x86/kvm/svm/svm.c |   22 +++---
 arch/x86/kvm/svm/svm.h |4 ++--
 4 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ffc89d8e4fcb..51833a611eba 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -3,6 +3,7 @@
 #define __SVM_H
 
 #include 
+#include 
 
 /*
  * VMCB Control Area intercept bits starting
@@ -12,6 +13,7 @@
 enum vector_offset {
CR_VECTOR = 0,
DR_VECTOR,
+   EXCEPTION_VECTOR,
MAX_VECTORS,
 };
 
@@ -52,6 +54,25 @@ enum {
INTERCEPT_DR5_WRITE,
INTERCEPT_DR6_WRITE,
INTERCEPT_DR7_WRITE,
+   /* Byte offset 008h (Vector 2) */
+   INTERCEPT_DE_VECTOR = 64 + DE_VECTOR,
+   INTERCEPT_DB_VECTOR = 64 + DB_VECTOR,
+   INTERCEPT_BP_VECTOR = 64 + BP_VECTOR,
+   INTERCEPT_OF_VECTOR = 64 + OF_VECTOR,
+   INTERCEPT_BR_VECTOR = 64 + BR_VECTOR,
+   INTERCEPT_UD_VECTOR = 64 + UD_VECTOR,
+   INTERCEPT_NM_VECTOR = 64 + NM_VECTOR,
+   INTERCEPT_DF_VECTOR = 64 + DF_VECTOR,
+   INTERCEPT_TS_VECTOR = 64 + TS_VECTOR,
+   INTERCEPT_NP_VECTOR = 64 + NP_VECTOR,
+   INTERCEPT_SS_VECTOR = 64 + SS_VECTOR,
+   INTERCEPT_GP_VECTOR = 64 + GP_VECTOR,
+   INTERCEPT_PF_VECTOR = 64 + PF_VECTOR,
+   INTERCEPT_MF_VECTOR = 64 + MF_VECTOR,
+   INTERCEPT_AC_VECTOR = 64 + AC_VECTOR,
+   INTERCEPT_MC_VECTOR = 64 + MC_VECTOR,
+   INTERCEPT_XM_VECTOR = 64 + XM_VECTOR,
+   INTERCEPT_VE_VECTOR = 64 + VE_VECTOR,
 };
 
 enum {
@@ -107,7 +128,6 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercepts[MAX_VECTORS];
-   u32 intercept_exceptions;
u64 intercept;
u8 reserved_1[40];
u16 pause_filter_thresh;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ba11fc3bf843..c161bd38f401 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -109,12 +109,11 @@ void recalc_intercepts(struct vcpu_svm *svm)
h = >nested.hsave->control;
g = >nested.ctl;
 
-   svm->nested.host_intercept_exceptions = h->intercept_exceptions;
+   svm->nested.host_intercept_exceptions = h->intercepts[EXCEPTION_VECTOR];
 
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
-   c->intercept_exceptions = h->intercept_exceptions;
c->intercept = h->intercept;
 
if (g->int_ctl & V_INTR_MASKING_MASK) {
@@ -136,7 +135,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] |= g->intercepts[i];
 
-   c->intercept_exceptions |= g->intercept_exceptions;
c->intercept |= g->intercept;
 }
 
@@ -148,7 +146,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area 
*dst,
for (i = 0; i < MAX_VECTORS; i++)
dst->intercepts[i] = from->intercepts[i];
 
-   dst->intercept_exceptions = from->intercept_exceptions;
dst->intercept= from->intercept;
dst->iopm_base_pa = from->iopm_base_pa;
dst->msrpm_base_pa= from->msrpm_base_pa;
@@ -495,7 +492,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 
trace_kvm_nested_intercepts(nested_vmcb->control.intercepts[CR_VECTOR] 
& 0x,
nested_vmcb->control.intercepts[CR_VECTOR] 
>> 16,
-   nested_vmcb->control.intercept_exceptions,
+   
nested_vmcb->control.intercepts[EXCEPTION_VECTOR],
nested_vmcb->control.intercept);
 
/* Clear internal status */
@@ -835,7 +832,7 @@ static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
unsigned int nr = svm->vcpu.arch.exception.nr;
 
-   return (svm->nested.ctl.intercept_exceptions & (1 << nr));
+   return (svm->nested.ctl.intercepts[EXCEPTION_VECTOR] & BIT(nr));
 }
 
 static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
@@ -984,7 +981,8 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
 
-   if (get_host_vmcb(svm)->control.intercept_exceptions & 
excp_bits)
+   if (get_host_vmcb(svm)->control.intercepts[EXCEPTION_VECTOR] &
+

[PATCH v6 10/12] KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c

2020-09-11 Thread Babu Moger
Handling of kvm_read/write_guest_virt*() errors can be moved to common
code. The same code can be used by both VMX and SVM.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/vmx/nested.c |   12 ++--
 arch/x86/kvm/vmx/vmx.c|   29 +
 arch/x86/kvm/vmx/vmx.h|2 --
 arch/x86/kvm/x86.c|   28 
 arch/x86/kvm/x86.h|2 ++
 5 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 23b58c28a1c9..28becd22d9d9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4688,7 +4688,7 @@ static int nested_vmx_get_vmptr(struct kvm_vcpu *vcpu, 
gpa_t *vmpointer,
 
r = kvm_read_guest_virt(vcpu, gva, vmpointer, sizeof(*vmpointer), );
if (r != X86EMUL_CONTINUE) {
-   *ret = vmx_handle_memory_failure(vcpu, r, );
+   *ret = kvm_handle_memory_failure(vcpu, r, );
return -EINVAL;
}
 
@@ -4995,7 +4995,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
/* _system ok, nested_vmx_check_permission has verified cpl=0 */
r = kvm_write_guest_virt_system(vcpu, gva, , len, );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
}
 
return nested_vmx_succeed(vcpu);
@@ -5068,7 +5068,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , len, );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
}
 
field = kvm_register_readl(vcpu, (((instr_info) >> 28) & 0xf));
@@ -5230,7 +5230,7 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
r = kvm_write_guest_virt_system(vcpu, gva, (void *)_vmptr,
sizeof(gpa_t), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
return nested_vmx_succeed(vcpu);
 }
@@ -5283,7 +5283,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
/*
 * Nested EPT roots are always held through guest_mmu,
@@ -5365,7 +5365,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
if (operand.vpid >> 16)
return nested_vmx_fail(vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..b15b4c6e3b46 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1598,33 +1598,6 @@ static int skip_emulated_instruction(struct kvm_vcpu 
*vcpu)
return 1;
 }
 
-/*
- * Handles kvm_read/write_guest_virt*() result and either injects #PF or 
returns
- * KVM_EXIT_INTERNAL_ERROR for cases not currently handled by KVM. Return value
- * indicates whether exit to userspace is needed.
- */
-int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
- struct x86_exception *e)
-{
-   if (r == X86EMUL_PROPAGATE_FAULT) {
-   kvm_inject_emulated_page_fault(vcpu, e);
-   return 1;
-   }
-
-   /*
-* In case kvm_read/write_guest_virt*() failed with X86EMUL_IO_NEEDED
-* while handling a VMX instruction KVM could've handled the request
-* correctly by exiting to userspace and performing I/O but there
-* doesn't seem to be a real use-case behind such requests, just return
-* KVM_EXIT_INTERNAL_ERROR for now.
-*/
-   vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-   vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-   vcpu->run->internal.ndata = 0;
-
-   return 0;
-}
-
 /*
  * Recognizes a pending MTF VM-exit and records the nested state for later
  * delivery.
@@ -5558,7 +5531,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
if (operand.pcid >> 12 != 0) {
kvm_inject_gp(vcpu, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h

RE: [PATCH v5 00/12] SVM cleanup and INVPCID feature support

2020-09-09 Thread Babu Moger
Hi Paolo,
Let me know if you have feedback on this series? I was thinking of
refreshing the series. There is one minor comment on PATCH v5 04/12(from Jim).
Thanks
Babu

> -Original Message-
> From: Moger, Babu 
> Sent: Wednesday, August 26, 2020 2:14 PM
> To: pbonz...@redhat.com; vkuzn...@redhat.com;
> sean.j.christopher...@intel.com; jmatt...@google.com
> Cc: wanpen...@tencent.com; k...@vger.kernel.org; j...@8bytes.org;
> x...@kernel.org; linux-kernel@vger.kernel.org; Moger, Babu
> ; mi...@redhat.com; b...@alien8.de;
> h...@zytor.com; t...@linutronix.de
> Subject: [PATCH v5 00/12] SVM cleanup and INVPCID feature support
> 
> The following series adds the support for PCID/INVPCID on AMD guests.
> While doing it re-structured the vmcb_control_area data structure to combine
> all the intercept vectors into one 32 bit array. Makes it easy for future 
> additions.
> Re-arranged few pcid related code to make it common between SVM and VMX.
> 
> INVPCID interceptions are added only when the guest is running with shadow
> page table enabled. In this case the hypervisor needs to handle the tlbflush
> based on the type of invpcid instruction.
> 
> For the guests with nested page table (NPT) support, the INVPCID feature works
> as running it natively. KVM does not need to do any special handling.
> 
> AMD documentation for INVPCID feature is available at "AMD64 Architecture
> Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or
> later)"
> 
> The documentation can be obtained at the links below:
> Link: https://www.amd.com/system/files/TechDocs/24593.pdf
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v5:
>  All the changes are related to rebase.
>  Aplies cleanly on mainline and kvm(master) tree.
>  Resending it to get some attention.
> 
> v4:
> 
> https://lore.kernel.org/lkml/159676101387.12805.18038347880482984693.stgi
> t@bmoger-ubuntu/
>  1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
> to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
> vmcb_control_area structure(Suggested by Paolo).
>  2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle
> kvm_read/write_guest_virt*())
> to make it common across both SVM/VMX(Suggested by Jim Mattson).
>  3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-
> by"
> on few patches which I have changed since v3.
> 
> v3:
> 
> https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgi
> t@bmoger-ubuntu/
>  1. Addressing the comments from Jim Mattson. Follow the v2 link below
> for the context.
>  2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
> using native __set_bit, clear_bit and test_bit.
>  3. Combined all the intercepts vectors into single 32 bit array.
>  4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
> clr_exception_intercept etc. Used the generic set_intercept and
> clr_intercept where applicable.
>  5. Tested both L1 guest and l2 nested guests.
> 
> v2:
> 
> https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit
> @bmoger-ubuntu/
>   - Taken care of few comments from Jim Mattson.
>   - KVM interceptions added only when tdp is off. No interceptions
> when tdp is on.
>   - Reverted the fault priority to original order in VMX.
> 
> v1:
> 
> https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgi
> t@bmoger-ubuntu/
> 
> Babu Moger (12):
>   KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
>   KVM: SVM: Change intercept_cr to generic intercepts
>   KVM: SVM: Change intercept_dr to generic intercepts
>   KVM: SVM: Modify intercept_exceptions to generic intercepts
>   KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
>   KVM: SVM: Add new intercept vector in vmcb_control_area
>   KVM: nSVM: Cleanup nested_state data structure
>   KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
>   KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
>   KVM: X86: Rename and move the function vmx_handle_memory_failure to
> x86.c
>   KVM: X86: Move handling of INVPCID types to x86
>   KVM:SVM: Enable INVPCID feature on AMD
> 
> 
>  arch/x86/include/asm/svm.h  |  117 +--
>  arch/x86/include/uapi/asm/svm.h |2 +
>  arch/x86/kvm/svm/nested.c   |   66 +---
>  arch/x86/kvm/svm/svm.c  |  131 ++
> -
>  arch/x86/kvm/svm/svm.h  |   87 +-
>  arch/x86/kvm/trace.h

Re: [PATCH v5 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts

2020-08-26 Thread Babu Moger



On 8/26/20 3:55 PM, Jim Mattson wrote:
> On Wed, Aug 26, 2020 at 12:14 PM Babu Moger  wrote:
>>
>> Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use
>> the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to
>> set/clear/test the intercept_exceptions bits.
>>
>> Signed-off-by: Babu Moger 
>> Reviewed-by: Jim Mattson 
>> ---
> 
>> @@ -835,7 +832,7 @@ static bool nested_exit_on_exception(struct vcpu_svm 
>> *svm)
>>  {
>> unsigned int nr = svm->vcpu.arch.exception.nr;
>>
>> -   return (svm->nested.ctl.intercept_exceptions & (1 << nr));
>> +   return (svm->nested.ctl.intercepts[EXCEPTION_VECTOR] & (1 << nr));
> Nit: BIT(nr) rather than (1 << nr).

Sure. will change it. thanks



[PATCH v5 12/12] KVM:SVM: Enable INVPCID feature on AMD

2020-08-26 Thread Babu Moger
The following intercept bit has been added to support VMEXIT
for INVPCID instruction:
CodeNameCause
A2h VMEXIT_INVPCID  INVPCID instruction

The following bit has been added to the VMCB layout control area
to control intercept of INVPCID:
Byte Offset Bit(s)Function
14h 2 intercept INVPCID

Enable the interceptions when the the guest is running with shadow
page table enabled and handle the tlbflush based on the invpcid
instruction type.

For the guests with nested page table (NPT) support, the INVPCID
feature works as running it natively. KVM does not need to do any
special handling in this case.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/uapi/asm/svm.h |2 ++
 arch/x86/kvm/svm/svm.c  |   51 +++
 2 files changed, 53 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 2e8a30f06c74..522d42dfc28c 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -76,6 +76,7 @@
 #define SVM_EXIT_MWAIT_COND0x08c
 #define SVM_EXIT_XSETBV0x08d
 #define SVM_EXIT_RDPRU 0x08e
+#define SVM_EXIT_INVPCID   0x0a2
 #define SVM_EXIT_NPF   0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI   0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402
@@ -171,6 +172,7 @@
{ SVM_EXIT_MONITOR, "monitor" }, \
{ SVM_EXIT_MWAIT,   "mwait" }, \
{ SVM_EXIT_XSETBV,  "xsetbv" }, \
+   { SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 96617b61e531..5c6b8d0f7628 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -813,6 +813,9 @@ static __init void svm_set_cpu_caps(void)
if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
boot_cpu_has(X86_FEATURE_AMD_SSBD))
kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+
+   /* Enable INVPCID feature */
+   kvm_cpu_cap_check_and_set(X86_FEATURE_INVPCID);
 }
 
 static __init int svm_hardware_setup(void)
@@ -985,6 +988,21 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, 
u64 offset)
return svm->vmcb->control.tsc_offset;
 }
 
+static void svm_check_invpcid(struct vcpu_svm *svm)
+{
+   /*
+* Intercept INVPCID instruction only if shadow page table is
+* enabled. Interception is not required with nested page table
+* enabled.
+*/
+   if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
+   if (!npt_enabled)
+   svm_set_intercept(svm, INTERCEPT_INVPCID);
+   else
+   svm_clr_intercept(svm, INTERCEPT_INVPCID);
+   }
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
struct vmcb_control_area *control = >vmcb->control;
@@ -1114,6 +1132,8 @@ static void init_vmcb(struct vcpu_svm *svm)
svm_clr_intercept(svm, INTERCEPT_PAUSE);
}
 
+   svm_check_invpcid(svm);
+
if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
@@ -2730,6 +2750,33 @@ static int mwait_interception(struct vcpu_svm *svm)
return nop_interception(svm);
 }
 
+static int invpcid_interception(struct vcpu_svm *svm)
+{
+   struct kvm_vcpu *vcpu = >vcpu;
+   unsigned long type;
+   gva_t gva;
+
+   if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   /*
+* For an INVPCID intercept:
+* EXITINFO1 provides the linear address of the memory operand.
+* EXITINFO2 provides the contents of the register operand.
+*/
+   type = svm->vmcb->control.exit_info_2;
+   gva = svm->vmcb->control.exit_info_1;
+
+   if (type > 3) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   return kvm_handle_invpcid(vcpu, type, gva);
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = cr_interception,
[SVM_EXIT_READ_CR3] = cr_interception,
@@ -2792,6 +2839,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm 
*svm) = {
[SVM_EXIT_MWAIT]= mwait_interception,
[SVM_EXIT_XSETBV] 

[PATCH v5 11/12] KVM: X86: Move handling of INVPCID types to x86

2020-08-26 Thread Babu Moger
INVPCID instruction handling is mostly same across both VMX and
SVM. So, move the code to common x86.c.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/vmx/vmx.c |   68 +-
 arch/x86/kvm/x86.c |   78 
 arch/x86/kvm/x86.h |1 +
 3 files changed, 80 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b15b4c6e3b46..ff42d27f641f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5497,16 +5497,11 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 {
u32 vmx_instruction_info;
unsigned long type;
-   bool pcid_enabled;
gva_t gva;
-   struct x86_exception e;
-   unsigned i;
-   unsigned long roots_to_free = 0;
struct {
u64 pcid;
u64 gla;
} operand;
-   int r;
 
if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
kvm_queue_exception(vcpu, UD_VECTOR);
@@ -5529,68 +5524,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
sizeof(operand), ))
return 1;
 
-   r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
-   if (r != X86EMUL_CONTINUE)
-   return kvm_handle_memory_failure(vcpu, r, );
-
-   if (operand.pcid >> 12 != 0) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
-
-   pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
-
-   switch (type) {
-   case INVPCID_TYPE_INDIV_ADDR:
-   if ((!pcid_enabled && (operand.pcid != 0)) ||
-   is_noncanonical_address(operand.gla, vcpu)) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
-   kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
-   return kvm_skip_emulated_instruction(vcpu);
-
-   case INVPCID_TYPE_SINGLE_CTXT:
-   if (!pcid_enabled && (operand.pcid != 0)) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
-
-   if (kvm_get_active_pcid(vcpu) == operand.pcid) {
-   kvm_mmu_sync_roots(vcpu);
-   kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-   }
-
-   for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-   if (kvm_get_pcid(vcpu, 
vcpu->arch.mmu->prev_roots[i].pgd)
-   == operand.pcid)
-   roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
-
-   kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
-   /*
-* If neither the current cr3 nor any of the prev_roots use the
-* given PCID, then nothing needs to be done here because a
-* resync will happen anyway before switching to any other CR3.
-*/
-
-   return kvm_skip_emulated_instruction(vcpu);
-
-   case INVPCID_TYPE_ALL_NON_GLOBAL:
-   /*
-* Currently, KVM doesn't mark global entries in the shadow
-* page tables, so a non-global flush just degenerates to a
-* global flush. If needed, we could optimize this later by
-* keeping track of global entries in shadow page tables.
-*/
-
-   /* fall-through */
-   case INVPCID_TYPE_ALL_INCL_GLOBAL:
-   kvm_mmu_unload(vcpu);
-   return kvm_skip_emulated_instruction(vcpu);
-
-   default:
-   BUG(); /* We have already checked above that type <= 3 */
-   }
+   return kvm_handle_invpcid(vcpu, type, gva);
 }
 
 static int handle_pml_full(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d7930ecdddc..39ca22e0f8b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -10791,6 +10792,83 @@ int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, 
int r,
 }
 EXPORT_SYMBOL_GPL(kvm_handle_memory_failure);
 
+int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
+{
+   bool pcid_enabled;
+   struct x86_exception e;
+   unsigned i;
+   unsigned long roots_to_free = 0;
+   struct {
+   u64 pcid;
+   u64 gla;
+   } operand;
+   int r;
+
+   r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
+   if (r != X86EMUL_CONTINUE)
+   return kvm_handle_memory_failure(vcpu, r, );
+
+   if (operand.pcid >> 12 != 0) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+
+   switch (type) {
+   case INVPCID_TYPE_INDIV_ADDR:
+

[PATCH v5 09/12] KVM: SVM: Remove set_exception_intercept and clr_exception_intercept

2020-08-26 Thread Babu Moger
Remove set_exception_intercept and clr_exception_intercept.
Replace with generic set_intercept and clr_intercept for these calls.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/svm/svm.c |   20 ++--
 arch/x86/kvm/svm/svm.h |   18 --
 2 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0d7397f4a4f7..96617b61e531 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1003,11 +1003,11 @@ static void init_vmcb(struct vcpu_svm *svm)
 
set_dr_intercepts(svm);
 
-   set_exception_intercept(svm, INTERCEPT_PF_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_UD_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_MC_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_AC_VECTOR);
-   set_exception_intercept(svm, INTERCEPT_DB_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_PF_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_UD_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_MC_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_AC_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_DB_VECTOR);
/*
 * Guest access to VMware backdoor ports could legitimately
 * trigger #GP because of TSS I/O permission bitmap.
@@ -1015,7 +1015,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 * as VMware does.
 */
if (enable_vmware_backdoor)
-   set_exception_intercept(svm, INTERCEPT_GP_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_GP_VECTOR);
 
svm_set_intercept(svm, INTERCEPT_INTR);
svm_set_intercept(svm, INTERCEPT_NMI);
@@ -1093,7 +1093,7 @@ static void init_vmcb(struct vcpu_svm *svm)
/* Setup VMCB for Nested Paging */
control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
svm_clr_intercept(svm, INTERCEPT_INVLPG);
-   clr_exception_intercept(svm, INTERCEPT_PF_VECTOR);
+   svm_clr_intercept(svm, INTERCEPT_PF_VECTOR);
svm_clr_intercept(svm, INTERCEPT_CR3_READ);
svm_clr_intercept(svm, INTERCEPT_CR3_WRITE);
save->g_pat = svm->vcpu.arch.pat;
@@ -1135,7 +1135,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
if (sev_guest(svm->vcpu.kvm)) {
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
-   clr_exception_intercept(svm, INTERCEPT_UD_VECTOR);
+   svm_clr_intercept(svm, INTERCEPT_UD_VECTOR);
}
 
vmcb_mark_all_dirty(svm->vmcb);
@@ -1646,11 +1646,11 @@ static void update_exception_bitmap(struct kvm_vcpu 
*vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   clr_exception_intercept(svm, INTERCEPT_BP_VECTOR);
+   svm_clr_intercept(svm, INTERCEPT_BP_VECTOR);
 
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
-   set_exception_intercept(svm, INTERCEPT_BP_VECTOR);
+   svm_set_intercept(svm, INTERCEPT_BP_VECTOR);
}
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8128bac75fa2..fc4bfea3f555 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -261,24 +261,6 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)
recalc_intercepts(svm);
 }
 
-static inline void set_exception_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_set_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
-static inline void clr_exception_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_clr_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
 static inline void svm_set_intercept(struct vcpu_svm *svm, int bit)
 {
struct vmcb *vmcb = get_host_vmcb(svm);



[PATCH v5 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts

2020-08-26 Thread Babu Moger
Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to
set/clear/test the intercept_exceptions bits.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |   22 +-
 arch/x86/kvm/svm/nested.c  |   12 +---
 arch/x86/kvm/svm/svm.c |   22 +++---
 arch/x86/kvm/svm/svm.h |4 ++--
 4 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ffc89d8e4fcb..51833a611eba 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -3,6 +3,7 @@
 #define __SVM_H
 
 #include 
+#include 
 
 /*
  * VMCB Control Area intercept bits starting
@@ -12,6 +13,7 @@
 enum vector_offset {
CR_VECTOR = 0,
DR_VECTOR,
+   EXCEPTION_VECTOR,
MAX_VECTORS,
 };
 
@@ -52,6 +54,25 @@ enum {
INTERCEPT_DR5_WRITE,
INTERCEPT_DR6_WRITE,
INTERCEPT_DR7_WRITE,
+   /* Byte offset 008h (Vector 2) */
+   INTERCEPT_DE_VECTOR = 64 + DE_VECTOR,
+   INTERCEPT_DB_VECTOR = 64 + DB_VECTOR,
+   INTERCEPT_BP_VECTOR = 64 + BP_VECTOR,
+   INTERCEPT_OF_VECTOR = 64 + OF_VECTOR,
+   INTERCEPT_BR_VECTOR = 64 + BR_VECTOR,
+   INTERCEPT_UD_VECTOR = 64 + UD_VECTOR,
+   INTERCEPT_NM_VECTOR = 64 + NM_VECTOR,
+   INTERCEPT_DF_VECTOR = 64 + DF_VECTOR,
+   INTERCEPT_TS_VECTOR = 64 + TS_VECTOR,
+   INTERCEPT_NP_VECTOR = 64 + NP_VECTOR,
+   INTERCEPT_SS_VECTOR = 64 + SS_VECTOR,
+   INTERCEPT_GP_VECTOR = 64 + GP_VECTOR,
+   INTERCEPT_PF_VECTOR = 64 + PF_VECTOR,
+   INTERCEPT_MF_VECTOR = 64 + MF_VECTOR,
+   INTERCEPT_AC_VECTOR = 64 + AC_VECTOR,
+   INTERCEPT_MC_VECTOR = 64 + MC_VECTOR,
+   INTERCEPT_XM_VECTOR = 64 + XM_VECTOR,
+   INTERCEPT_VE_VECTOR = 64 + VE_VECTOR,
 };
 
 enum {
@@ -107,7 +128,6 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercepts[MAX_VECTORS];
-   u32 intercept_exceptions;
u64 intercept;
u8 reserved_1[40];
u16 pause_filter_thresh;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ba11fc3bf843..798ae2fabc74 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -109,12 +109,11 @@ void recalc_intercepts(struct vcpu_svm *svm)
h = >nested.hsave->control;
g = >nested.ctl;
 
-   svm->nested.host_intercept_exceptions = h->intercept_exceptions;
+   svm->nested.host_intercept_exceptions = h->intercepts[EXCEPTION_VECTOR];
 
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
-   c->intercept_exceptions = h->intercept_exceptions;
c->intercept = h->intercept;
 
if (g->int_ctl & V_INTR_MASKING_MASK) {
@@ -136,7 +135,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] |= g->intercepts[i];
 
-   c->intercept_exceptions |= g->intercept_exceptions;
c->intercept |= g->intercept;
 }
 
@@ -148,7 +146,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area 
*dst,
for (i = 0; i < MAX_VECTORS; i++)
dst->intercepts[i] = from->intercepts[i];
 
-   dst->intercept_exceptions = from->intercept_exceptions;
dst->intercept= from->intercept;
dst->iopm_base_pa = from->iopm_base_pa;
dst->msrpm_base_pa= from->msrpm_base_pa;
@@ -495,7 +492,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 
trace_kvm_nested_intercepts(nested_vmcb->control.intercepts[CR_VECTOR] 
& 0x,
nested_vmcb->control.intercepts[CR_VECTOR] 
>> 16,
-   nested_vmcb->control.intercept_exceptions,
+   
nested_vmcb->control.intercepts[EXCEPTION_VECTOR],
nested_vmcb->control.intercept);
 
/* Clear internal status */
@@ -835,7 +832,7 @@ static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
unsigned int nr = svm->vcpu.arch.exception.nr;
 
-   return (svm->nested.ctl.intercept_exceptions & (1 << nr));
+   return (svm->nested.ctl.intercepts[EXCEPTION_VECTOR] & (1 << nr));
 }
 
 static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
@@ -984,7 +981,8 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
 
-   if (get_host_vmcb(svm)->control.intercept_exceptions & 
excp_bits)
+   if (get_host_vmcb(svm)->control.intercepts[EXCEPTION_VECTOR] &
+  

[PATCH v5 10/12] KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c

2020-08-26 Thread Babu Moger
Handling of kvm_read/write_guest_virt*() errors can be moved to common
code. The same code can be used by both VMX and SVM.

Signed-off-by: Babu Moger 
---
 arch/x86/kvm/vmx/nested.c |   12 ++--
 arch/x86/kvm/vmx/vmx.c|   29 +
 arch/x86/kvm/vmx/vmx.h|2 --
 arch/x86/kvm/x86.c|   28 
 arch/x86/kvm/x86.h|2 ++
 5 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 23b58c28a1c9..28becd22d9d9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4688,7 +4688,7 @@ static int nested_vmx_get_vmptr(struct kvm_vcpu *vcpu, 
gpa_t *vmpointer,
 
r = kvm_read_guest_virt(vcpu, gva, vmpointer, sizeof(*vmpointer), );
if (r != X86EMUL_CONTINUE) {
-   *ret = vmx_handle_memory_failure(vcpu, r, );
+   *ret = kvm_handle_memory_failure(vcpu, r, );
return -EINVAL;
}
 
@@ -4995,7 +4995,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
/* _system ok, nested_vmx_check_permission has verified cpl=0 */
r = kvm_write_guest_virt_system(vcpu, gva, , len, );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
}
 
return nested_vmx_succeed(vcpu);
@@ -5068,7 +5068,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , len, );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
}
 
field = kvm_register_readl(vcpu, (((instr_info) >> 28) & 0xf));
@@ -5230,7 +5230,7 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
r = kvm_write_guest_virt_system(vcpu, gva, (void *)_vmptr,
sizeof(gpa_t), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
return nested_vmx_succeed(vcpu);
 }
@@ -5283,7 +5283,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
/*
 * Nested EPT roots are always held through guest_mmu,
@@ -5365,7 +5365,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
if (operand.vpid >> 16)
return nested_vmx_fail(vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..b15b4c6e3b46 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1598,33 +1598,6 @@ static int skip_emulated_instruction(struct kvm_vcpu 
*vcpu)
return 1;
 }
 
-/*
- * Handles kvm_read/write_guest_virt*() result and either injects #PF or 
returns
- * KVM_EXIT_INTERNAL_ERROR for cases not currently handled by KVM. Return value
- * indicates whether exit to userspace is needed.
- */
-int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
- struct x86_exception *e)
-{
-   if (r == X86EMUL_PROPAGATE_FAULT) {
-   kvm_inject_emulated_page_fault(vcpu, e);
-   return 1;
-   }
-
-   /*
-* In case kvm_read/write_guest_virt*() failed with X86EMUL_IO_NEEDED
-* while handling a VMX instruction KVM could've handled the request
-* correctly by exiting to userspace and performing I/O but there
-* doesn't seem to be a real use-case behind such requests, just return
-* KVM_EXIT_INTERNAL_ERROR for now.
-*/
-   vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-   vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-   vcpu->run->internal.ndata = 0;
-
-   return 0;
-}
-
 /*
  * Recognizes a pending MTF VM-exit and records the nested state for later
  * delivery.
@@ -5558,7 +5531,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
if (operand.pcid >> 12 != 0) {
kvm_inject_gp(vcpu, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 26175a4759fa..7c578564

[PATCH v5 08/12] KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept

2020-08-26 Thread Babu Moger
Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead
call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep
tfor all cr intercepts.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/svm/svm.c |   34 +-
 arch/x86/kvm/svm/svm.h |   25 -
 2 files changed, 17 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 17bfa34033ac..0d7397f4a4f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -992,14 +992,14 @@ static void init_vmcb(struct vcpu_svm *svm)
 
svm->vcpu.arch.hflags = 0;
 
-   set_cr_intercept(svm, INTERCEPT_CR0_READ);
-   set_cr_intercept(svm, INTERCEPT_CR3_READ);
-   set_cr_intercept(svm, INTERCEPT_CR4_READ);
-   set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
-   set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
-   set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR0_READ);
+   svm_set_intercept(svm, INTERCEPT_CR3_READ);
+   svm_set_intercept(svm, INTERCEPT_CR4_READ);
+   svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
if (!kvm_vcpu_apicv_active(>vcpu))
-   set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 
set_dr_intercepts(svm);
 
@@ -1094,8 +1094,8 @@ static void init_vmcb(struct vcpu_svm *svm)
control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
svm_clr_intercept(svm, INTERCEPT_INVLPG);
clr_exception_intercept(svm, INTERCEPT_PF_VECTOR);
-   clr_cr_intercept(svm, INTERCEPT_CR3_READ);
-   clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
+   svm_clr_intercept(svm, INTERCEPT_CR3_READ);
+   svm_clr_intercept(svm, INTERCEPT_CR3_WRITE);
save->g_pat = svm->vcpu.arch.pat;
save->cr3 = 0;
save->cr4 = 0;
@@ -1549,11 +1549,11 @@ static void update_cr0_intercept(struct vcpu_svm *svm)
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 
if (gcr0 == *hcr0) {
-   clr_cr_intercept(svm, INTERCEPT_CR0_READ);
-   clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
+   svm_clr_intercept(svm, INTERCEPT_CR0_READ);
+   svm_clr_intercept(svm, INTERCEPT_CR0_WRITE);
} else {
-   set_cr_intercept(svm, INTERCEPT_CR0_READ);
-   set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR0_READ);
+   svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
}
 }
 
@@ -2931,7 +2931,7 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t 
exit_fastpath)
 
trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
 
-   if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
+   if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
vcpu->arch.cr0 = svm->vmcb->save.cr0;
if (npt_enabled)
vcpu->arch.cr3 = svm->vmcb->save.cr3;
@@ -3056,13 +3056,13 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
if (nested_svm_virtualize_tpr(vcpu))
return;
 
-   clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
 
if (irr == -1)
return;
 
if (tpr >= irr)
-   set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 }
 
 bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
@@ -3250,7 +3250,7 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu 
*vcpu)
if (nested_svm_virtualize_tpr(vcpu))
return;
 
-   if (!is_cr_intercept(svm, INTERCEPT_CR8_WRITE)) {
+   if (!svm_is_intercept(svm, INTERCEPT_CR8_WRITE)) {
int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK;
kvm_set_cr8(vcpu, cr8);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ffb35a83048f..8128bac75fa2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -228,31 +228,6 @@ static inline bool vmcb_is_intercept(struct 
vmcb_control_area *control, int bit)
return test_bit(bit, (unsigned long *)>intercepts);
 }
 
-static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_set_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
-static inline void clr_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   vmcb_clr_intercept(>control, bit);
-
-   recalc_intercepts(svm);
-}
-
-static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-   struct vmcb *vmcb = get_host_vmcb(svm);
-
-   r

[PATCH v5 07/12] KVM: nSVM: Cleanup nested_state data structure

2020-08-26 Thread Babu Moger
host_intercept_exceptions is not used anywhere. Clean it up.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/kvm/svm/nested.c |2 --
 arch/x86/kvm/svm/svm.h|1 -
 2 files changed, 3 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a04c9909386a..9595c1a1a039 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -109,8 +109,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
h = >nested.hsave->control;
g = >nested.ctl;
 
-   svm->nested.host_intercept_exceptions = h->intercepts[EXCEPTION_VECTOR];
-
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2cde5091775a..ffb35a83048f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -86,7 +86,6 @@ struct svm_nested_state {
u64 hsave_msr;
u64 vm_cr_msr;
u64 vmcb;
-   u32 host_intercept_exceptions;
 
/* These are the merged vectors */
u32 *msrpm;



[PATCH v5 06/12] KVM: SVM: Add new intercept vector in vmcb_control_area

2020-08-26 Thread Babu Moger
The new intercept bits have been added in vmcb control area to support
few more interceptions. Here are the some of them.
 - INTERCEPT_INVLPGB,
 - INTERCEPT_INVLPGB_ILLEGAL,
 - INTERCEPT_INVPCID,
 - INTERCEPT_MCOMMIT,
 - INTERCEPT_TLBSYNC,

Add new intercept vector in vmcb_control_area to support these instructions.
Also update kvm_nested_vmrun trace function to support the new addition.

AMD documentation for these instructions is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593
Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |7 +++
 arch/x86/kvm/svm/nested.c  |3 ++-
 arch/x86/kvm/trace.h   |   13 -
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 9f0fa02fc838..623c392a55ac 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -16,6 +16,7 @@ enum vector_offset {
EXCEPTION_VECTOR,
INTERCEPT_VECTOR_3,
INTERCEPT_VECTOR_4,
+   INTERCEPT_VECTOR_5,
MAX_VECTORS,
 };
 
@@ -124,6 +125,12 @@ enum {
INTERCEPT_MWAIT_COND,
INTERCEPT_XSETBV,
INTERCEPT_RDPRU,
+   /* Byte offset 014h (Vector 5) */
+   INTERCEPT_INVLPGB = 160,
+   INTERCEPT_INVLPGB_ILLEGAL,
+   INTERCEPT_INVPCID,
+   INTERCEPT_MCOMMIT,
+   INTERCEPT_TLBSYNC,
 };
 
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 772e6d8e6459..a04c9909386a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -489,7 +489,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
nested_vmcb->control.intercepts[CR_VECTOR] 
>> 16,

nested_vmcb->control.intercepts[EXCEPTION_VECTOR],

nested_vmcb->control.intercepts[INTERCEPT_VECTOR_3],
-   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4]);
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4],
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_5]);
 
/* Clear internal status */
kvm_clear_exception_queue(>vcpu);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 6e7262229e6a..11046171b5d9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -544,9 +544,10 @@ TRACE_EVENT(kvm_nested_vmrun,
 );
 
 TRACE_EVENT(kvm_nested_intercepts,
-   TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u32 
intercept1,
-__u32 intercept2),
-   TP_ARGS(cr_read, cr_write, exceptions, intercept1, intercept2),
+   TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
+__u32 intercept1, __u32 intercept2, __u32 intercept3),
+   TP_ARGS(cr_read, cr_write, exceptions, intercept1,
+   intercept2, intercept3),
 
TP_STRUCT__entry(
__field(__u16,  cr_read )
@@ -554,6 +555,7 @@ TRACE_EVENT(kvm_nested_intercepts,
__field(__u32,  exceptions  )
__field(__u32,  intercept1  )
__field(__u32,  intercept2  )
+   __field(__u32,  intercept3  )
),
 
TP_fast_assign(
@@ -562,12 +564,13 @@ TRACE_EVENT(kvm_nested_intercepts,
__entry->exceptions = exceptions;
__entry->intercept1 = intercept1;
__entry->intercept2 = intercept2;
+   __entry->intercept3 = intercept3;
),
 
TP_printk("cr_read: %04x cr_write: %04x excp: %08x "
- "intercept1: %08x intercept2: %08x",
+ "intercept1: %08x intercept2: %08x  intercept3: %08x",
  __entry->cr_read, __entry->cr_write, __entry->exceptions,
- __entry->intercept1, __entry->intercept2)
+ __entry->intercept1, __entry->intercept2, __entry->intercept3)
 );
 /*
  * Tracepoint for #VMEXIT while nested



[PATCH v5 03/12] KVM: SVM: Change intercept_dr to generic intercepts

2020-08-26 Thread Babu Moger
Modify intercept_dr to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
to set/clear/test the intercept_dr bits.

Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/svm.h |   36 ++--
 arch/x86/kvm/svm/nested.c  |6 +-
 arch/x86/kvm/svm/svm.c |4 ++--
 arch/x86/kvm/svm/svm.h |   34 +-
 4 files changed, 38 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index d4739f4eae63..ffc89d8e4fcb 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -11,6 +11,7 @@
 
 enum vector_offset {
CR_VECTOR = 0,
+   DR_VECTOR,
MAX_VECTORS,
 };
 
@@ -34,6 +35,23 @@ enum {
INTERCEPT_CR6_WRITE,
INTERCEPT_CR7_WRITE,
INTERCEPT_CR8_WRITE,
+   /* Byte offset 004h (Vector 1) */
+   INTERCEPT_DR0_READ = 32,
+   INTERCEPT_DR1_READ,
+   INTERCEPT_DR2_READ,
+   INTERCEPT_DR3_READ,
+   INTERCEPT_DR4_READ,
+   INTERCEPT_DR5_READ,
+   INTERCEPT_DR6_READ,
+   INTERCEPT_DR7_READ,
+   INTERCEPT_DR0_WRITE = 48,
+   INTERCEPT_DR1_WRITE,
+   INTERCEPT_DR2_WRITE,
+   INTERCEPT_DR3_WRITE,
+   INTERCEPT_DR4_WRITE,
+   INTERCEPT_DR5_WRITE,
+   INTERCEPT_DR6_WRITE,
+   INTERCEPT_DR7_WRITE,
 };
 
 enum {
@@ -89,7 +107,6 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercepts[MAX_VECTORS];
-   u32 intercept_dr;
u32 intercept_exceptions;
u64 intercept;
u8 reserved_1[40];
@@ -271,23 +288,6 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK
 #define SVM_SELECTOR_CODE_MASK (1 << 3)
 
-#define INTERCEPT_DR0_READ 0
-#define INTERCEPT_DR1_READ 1
-#define INTERCEPT_DR2_READ 2
-#define INTERCEPT_DR3_READ 3
-#define INTERCEPT_DR4_READ 4
-#define INTERCEPT_DR5_READ 5
-#define INTERCEPT_DR6_READ 6
-#define INTERCEPT_DR7_READ 7
-#define INTERCEPT_DR0_WRITE(16 + 0)
-#define INTERCEPT_DR1_WRITE(16 + 1)
-#define INTERCEPT_DR2_WRITE(16 + 2)
-#define INTERCEPT_DR3_WRITE(16 + 3)
-#define INTERCEPT_DR4_WRITE(16 + 4)
-#define INTERCEPT_DR5_WRITE(16 + 5)
-#define INTERCEPT_DR6_WRITE(16 + 6)
-#define INTERCEPT_DR7_WRITE(16 + 7)
-
 #define SVM_EVTINJ_VEC_MASK 0xff
 
 #define SVM_EVTINJ_TYPE_SHIFT 8
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5f65b759abcb..ba11fc3bf843 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -114,7 +114,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
-   c->intercept_dr = h->intercept_dr;
c->intercept_exceptions = h->intercept_exceptions;
c->intercept = h->intercept;
 
@@ -137,7 +136,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] |= g->intercepts[i];
 
-   c->intercept_dr |= g->intercept_dr;
c->intercept_exceptions |= g->intercept_exceptions;
c->intercept |= g->intercept;
 }
@@ -150,7 +148,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area 
*dst,
for (i = 0; i < MAX_VECTORS; i++)
dst->intercepts[i] = from->intercepts[i];
 
-   dst->intercept_dr = from->intercept_dr;
dst->intercept_exceptions = from->intercept_exceptions;
dst->intercept= from->intercept;
dst->iopm_base_pa = from->iopm_base_pa;
@@ -779,8 +776,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
break;
}
case SVM_EXIT_READ_DR0 ... SVM_EXIT_WRITE_DR7: {
-   u32 bit = 1U << (exit_code - SVM_EXIT_READ_DR0);
-   if (svm->nested.ctl.intercept_dr & bit)
+   if (vmcb_is_intercept(>nested.ctl, exit_code))
vmexit = NESTED_EXIT_DONE;
break;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 523936b80dda..1a5f3908b388 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2815,8 +2815,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("VMCB Control Area:\n");
pr_err("%-20s%04x\n", "cr_read:", control->intercepts[CR_VECTOR] & 
0x);
pr_err("%-20s%04x\n", "cr_write:", control->intercepts[CR_VECTOR] >> 
16);
-   pr_err("%-20s%04x\n", "dr_read:", control->intercept_dr & 0x);
-   pr_err("%-20s%04x\n", "dr_write:", control->intercept_dr >> 16);
+   pr_err("%-20s%04x\n", "dr_read:", control

[PATCH v5 05/12] KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors

2020-08-26 Thread Babu Moger
Convert all the intercepts to one array of 32 bit vectors in
vmcb_control_area. This makes it easy for future intercept vector
additions. Also update trace functions.

Signed-off-by: Babu Moger 
Reviewed-by: Jim Mattson 
---
 arch/x86/include/asm/svm.h |   14 +++---
 arch/x86/kvm/svm/nested.c  |   25 ++---
 arch/x86/kvm/svm/svm.c |   16 ++--
 arch/x86/kvm/svm/svm.h |   12 ++--
 arch/x86/kvm/trace.h   |   18 +++---
 5 files changed, 40 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 51833a611eba..9f0fa02fc838 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -14,6 +14,8 @@ enum vector_offset {
CR_VECTOR = 0,
DR_VECTOR,
EXCEPTION_VECTOR,
+   INTERCEPT_VECTOR_3,
+   INTERCEPT_VECTOR_4,
MAX_VECTORS,
 };
 
@@ -73,10 +75,8 @@ enum {
INTERCEPT_MC_VECTOR = 64 + MC_VECTOR,
INTERCEPT_XM_VECTOR = 64 + XM_VECTOR,
INTERCEPT_VE_VECTOR = 64 + VE_VECTOR,
-};
-
-enum {
-   INTERCEPT_INTR,
+   /* Byte offset 00Ch (Vector 3) */
+   INTERCEPT_INTR = 96,
INTERCEPT_NMI,
INTERCEPT_SMI,
INTERCEPT_INIT,
@@ -108,7 +108,8 @@ enum {
INTERCEPT_TASK_SWITCH,
INTERCEPT_FERR_FREEZE,
INTERCEPT_SHUTDOWN,
-   INTERCEPT_VMRUN,
+   /* Byte offset 010h (Vector 4) */
+   INTERCEPT_VMRUN = 128,
INTERCEPT_VMMCALL,
INTERCEPT_VMLOAD,
INTERCEPT_VMSAVE,
@@ -128,8 +129,7 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercepts[MAX_VECTORS];
-   u64 intercept;
-   u8 reserved_1[40];
+   u32 reserved_1[15 - MAX_VECTORS];
u16 pause_filter_thresh;
u16 pause_filter_count;
u64 iopm_base_pa;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 798ae2fabc74..772e6d8e6459 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -114,8 +114,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] = h->intercepts[i];
 
-   c->intercept = h->intercept;
-
if (g->int_ctl & V_INTR_MASKING_MASK) {
/* We only want the cr8 intercept bits of L1 */
vmcb_clr_intercept(c, INTERCEPT_CR8_READ);
@@ -126,16 +124,14 @@ void recalc_intercepts(struct vcpu_svm *svm)
 * affect any interrupt we may want to inject; therefore,
 * interrupt window vmexits are irrelevant to L0.
 */
-   c->intercept &= ~(1ULL << INTERCEPT_VINTR);
+   vmcb_clr_intercept(c, INTERCEPT_VINTR);
}
 
/* We don't want to see VMMCALLs from a nested guest */
-   c->intercept &= ~(1ULL << INTERCEPT_VMMCALL);
+   vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
 
for (i = 0; i < MAX_VECTORS; i++)
c->intercepts[i] |= g->intercepts[i];
-
-   c->intercept |= g->intercept;
 }
 
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
@@ -146,7 +142,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area 
*dst,
for (i = 0; i < MAX_VECTORS; i++)
dst->intercepts[i] = from->intercepts[i];
 
-   dst->intercept= from->intercept;
dst->iopm_base_pa = from->iopm_base_pa;
dst->msrpm_base_pa= from->msrpm_base_pa;
dst->tsc_offset   = from->tsc_offset;
@@ -179,7 +174,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 */
int i;
 
-   if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+   if (!(vmcb_is_intercept(>nested.ctl, INTERCEPT_MSR_PROT)))
return true;
 
for (i = 0; i < MSRPM_OFFSETS; i++) {
@@ -205,7 +200,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 
 static bool nested_vmcb_check_controls(struct vmcb_control_area *control)
 {
-   if ((control->intercept & (1ULL << INTERCEPT_VMRUN)) == 0)
+   if ((vmcb_is_intercept(control, INTERCEPT_VMRUN)) == 0)
return false;
 
if (control->asid == 0)
@@ -493,7 +488,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
trace_kvm_nested_intercepts(nested_vmcb->control.intercepts[CR_VECTOR] 
& 0x,
nested_vmcb->control.intercepts[CR_VECTOR] 
>> 16,

nested_vmcb->control.intercepts[EXCEPTION_VECTOR],
-   nested_vmcb->control.intercept);
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_3],
+   
nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4]);
 
/* Clear internal status */
   

[PATCH v5 01/12] KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)

2020-08-26 Thread Babu Moger
This is in preparation for the future intercept vector additions.

Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
using kernel APIs __set_bit, __clear_bit and test_bit espectively.

Signed-off-by: Babu Moger 
---
 arch/x86/kvm/svm/svm.h |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a798e1731709..1cff7644e70b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -214,6 +214,21 @@ static inline struct vmcb *get_host_vmcb(struct vcpu_svm 
*svm)
return svm->vmcb;
 }
 
+static inline void vmcb_set_intercept(struct vmcb_control_area *control, int 
bit)
+{
+   __set_bit(bit, (unsigned long *)>intercept_cr);
+}
+
+static inline void vmcb_clr_intercept(struct vmcb_control_area *control, int 
bit)
+{
+   __clear_bit(bit, (unsigned long *)>intercept_cr);
+}
+
+static inline bool vmcb_is_intercept(struct vmcb_control_area *control, int 
bit)
+{
+   return test_bit(bit, (unsigned long *)>intercept_cr);
+}
+
 static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
 {
struct vmcb *vmcb = get_host_vmcb(svm);



[PATCH v5 02/12] KVM: SVM: Change intercept_cr to generic intercepts

2020-08-26 Thread Babu Moger
Change intercept_cr to generic intercepts in vmcb_control_area.
Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
where applicable.

Signed-off-by: Babu Moger 
---
 arch/x86/include/asm/svm.h |   42 --
 arch/x86/kvm/svm/nested.c  |   26 +-
 arch/x86/kvm/svm/svm.c |4 ++--
 arch/x86/kvm/svm/svm.h |   12 ++--
 4 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 8a1f5382a4ea..d4739f4eae63 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -4,6 +4,37 @@
 
 #include 
 
+/*
+ * VMCB Control Area intercept bits starting
+ * at Byte offset 000h (Vector 0).
+ */
+
+enum vector_offset {
+   CR_VECTOR = 0,
+   MAX_VECTORS,
+};
+
+enum {
+   /* Byte offset 000h (Vector 0) */
+   INTERCEPT_CR0_READ = 0,
+   INTERCEPT_CR1_READ,
+   INTERCEPT_CR2_READ,
+   INTERCEPT_CR3_READ,
+   INTERCEPT_CR4_READ,
+   INTERCEPT_CR5_READ,
+   INTERCEPT_CR6_READ,
+   INTERCEPT_CR7_READ,
+   INTERCEPT_CR8_READ,
+   INTERCEPT_CR0_WRITE = 16,
+   INTERCEPT_CR1_WRITE,
+   INTERCEPT_CR2_WRITE,
+   INTERCEPT_CR3_WRITE,
+   INTERCEPT_CR4_WRITE,
+   INTERCEPT_CR5_WRITE,
+   INTERCEPT_CR6_WRITE,
+   INTERCEPT_CR7_WRITE,
+   INTERCEPT_CR8_WRITE,
+};
 
 enum {
INTERCEPT_INTR,
@@ -57,7 +88,7 @@ enum {
 
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
-   u32 intercept_cr;
+   u32 intercepts[MAX_VECTORS];
u32 intercept_dr;
u32 intercept_exceptions;
u64 intercept;
@@ -240,15 +271,6 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK
 #define SVM_SELECTOR_CODE_MASK (1 << 3)
 
-#define INTERCEPT_CR0_READ 0
-#define INTERCEPT_CR3_READ 3
-#define INTERCEPT_CR4_READ 4
-#define INTERCEPT_CR8_READ 8
-#define INTERCEPT_CR0_WRITE(16 + 0)
-#define INTERCEPT_CR3_WRITE(16 + 3)
-#define INTERCEPT_CR4_WRITE(16 + 4)
-#define INTERCEPT_CR8_WRITE(16 + 8)
-
 #define INTERCEPT_DR0_READ 0
 #define INTERCEPT_DR1_READ 1
 #define INTERCEPT_DR2_READ 2
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fb68467e6049..5f65b759abcb 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -98,6 +98,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu 
*vcpu)
 void recalc_intercepts(struct vcpu_svm *svm)
 {
struct vmcb_control_area *c, *h, *g;
+   unsigned int i;
 
vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 
@@ -110,15 +111,17 @@ void recalc_intercepts(struct vcpu_svm *svm)
 
svm->nested.host_intercept_exceptions = h->intercept_exceptions;
 
-   c->intercept_cr = h->intercept_cr;
+   for (i = 0; i < MAX_VECTORS; i++)
+   c->intercepts[i] = h->intercepts[i];
+
c->intercept_dr = h->intercept_dr;
c->intercept_exceptions = h->intercept_exceptions;
c->intercept = h->intercept;
 
if (g->int_ctl & V_INTR_MASKING_MASK) {
/* We only want the cr8 intercept bits of L1 */
-   c->intercept_cr &= ~(1U << INTERCEPT_CR8_READ);
-   c->intercept_cr &= ~(1U << INTERCEPT_CR8_WRITE);
+   vmcb_clr_intercept(c, INTERCEPT_CR8_READ);
+   vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
 
/*
 * Once running L2 with HF_VINTR_MASK, EFLAGS.IF does not
@@ -131,7 +134,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
/* We don't want to see VMMCALLs from a nested guest */
c->intercept &= ~(1ULL << INTERCEPT_VMMCALL);
 
-   c->intercept_cr |= g->intercept_cr;
+   for (i = 0; i < MAX_VECTORS; i++)
+   c->intercepts[i] |= g->intercepts[i];
+
c->intercept_dr |= g->intercept_dr;
c->intercept_exceptions |= g->intercept_exceptions;
c->intercept |= g->intercept;
@@ -140,7 +145,11 @@ void recalc_intercepts(struct vcpu_svm *svm)
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
   struct vmcb_control_area *from)
 {
-   dst->intercept_cr = from->intercept_cr;
+   unsigned int i;
+
+   for (i = 0; i < MAX_VECTORS; i++)
+   dst->intercepts[i] = from->intercepts[i];
+
dst->intercept_dr = from->intercept_dr;
dst->intercept_exceptions = from->intercept_exceptions;
dst->intercept= from->intercept;
@@ -487,8 +496,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
   nested_vmcb->control.event_inj,
   nested_vmcb->control.nested_ctl);
 
-   trace_kvm_nested_intercepts(ne

[PATCH v5 00/12] SVM cleanup and INVPCID feature support

2020-08-26 Thread Babu Moger
The following series adds the support for PCID/INVPCID on AMD guests.
While doing it re-structured the vmcb_control_area data structure to
combine all the intercept vectors into one 32 bit array. Makes it easy
for future additions. Re-arranged few pcid related code to make it common
between SVM and VMX.

INVPCID interceptions are added only when the guest is running with shadow
page table enabled. In this case the hypervisor needs to handle the tlbflush
based on the type of invpcid instruction.

For the guests with nested page table (NPT) support, the INVPCID feature
works as running it natively. KVM does not need to do any special handling.

AMD documentation for INVPCID feature is available at "AMD64 Architecture
Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or 
later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v5:
 All the changes are related to rebase.
 Aplies cleanly on mainline and kvm(master) tree. 
 Resending it to get some attention.

v4:
 
https://lore.kernel.org/lkml/159676101387.12805.18038347880482984693.stgit@bmoger-ubuntu/
 1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
vmcb_control_area structure(Suggested by Paolo).
 2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle 
kvm_read/write_guest_virt*())
to make it common across both SVM/VMX(Suggested by Jim Mattson).
 3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
on few patches which I have changed since v3.

v3:
 
https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgit@bmoger-ubuntu/
 1. Addressing the comments from Jim Mattson. Follow the v2 link below
for the context.
 2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
using native __set_bit, clear_bit and test_bit.
 3. Combined all the intercepts vectors into single 32 bit array.
 4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
clr_exception_intercept etc. Used the generic set_intercept and
clr_intercept where applicable.
 5. Tested both L1 guest and l2 nested guests. 

v2:
  
https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit@bmoger-ubuntu/
  - Taken care of few comments from Jim Mattson.
  - KVM interceptions added only when tdp is off. No interceptions
when tdp is on.
  - Reverted the fault priority to original order in VMX. 
  
v1:
  
https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgit@bmoger-ubuntu/

Babu Moger (12):
  KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
  KVM: SVM: Change intercept_cr to generic intercepts
  KVM: SVM: Change intercept_dr to generic intercepts
  KVM: SVM: Modify intercept_exceptions to generic intercepts
  KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
  KVM: SVM: Add new intercept vector in vmcb_control_area
  KVM: nSVM: Cleanup nested_state data structure
  KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
  KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
  KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
  KVM: X86: Move handling of INVPCID types to x86
  KVM:SVM: Enable INVPCID feature on AMD


 arch/x86/include/asm/svm.h  |  117 +--
 arch/x86/include/uapi/asm/svm.h |2 +
 arch/x86/kvm/svm/nested.c   |   66 +---
 arch/x86/kvm/svm/svm.c  |  131 ++-
 arch/x86/kvm/svm/svm.h  |   87 +-
 arch/x86/kvm/trace.h|   21 --
 arch/x86/kvm/vmx/nested.c   |   12 ++--
 arch/x86/kvm/vmx/vmx.c  |   95 
 arch/x86/kvm/vmx/vmx.h  |2 -
 arch/x86/kvm/x86.c  |  106 
 arch/x86/kvm/x86.h  |3 +
 11 files changed, 364 insertions(+), 278 deletions(-)

--
Signature


RE: [PATCH v4 00/12] SVM cleanup and INVPCID support for the AMD guests

2020-08-17 Thread Babu Moger
Paolo and others,  Any comments on this series?

> -Original Message-
> From: Moger, Babu 
> Sent: Thursday, August 6, 2020 7:46 PM
> To: pbonz...@redhat.com; vkuzn...@redhat.com; wanpen...@tencent.com;
> sean.j.christopher...@intel.com; jmatt...@google.com
> Cc: k...@vger.kernel.org; j...@8bytes.org; x...@kernel.org; linux-
> ker...@vger.kernel.org; mi...@redhat.com; b...@alien8.de;
> h...@zytor.com; t...@linutronix.de
> Subject: [PATCH v4 00/12] SVM cleanup and INVPCID support for the AMD
> guests
> 
> The following series adds the support for PCID/INVPCID on AMD guests.
> While doing it re-structured the vmcb_control_area data structure to
> combine all the intercept vectors into one 32 bit array. Makes it easy for
> future additions. Re-arranged few pcid related code to make it common
> between SVM and VMX.
> 
> INVPCID interceptions are added only when the guest is running with
> shadow page table enabled. In this case the hypervisor needs to handle the
> tlbflush based on the type of invpcid instruction.
> 
> For the guests with nested page table (NPT) support, the INVPCID feature
> works as running it natively. KVM does not need to do any special handling.
> 
> AMD documentation for INVPCID feature is available at "AMD64 Architecture
> Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev.
> 3.34(or later)"
> 
> The documentation can be obtained at the links below:
> Link: https://www.amd.com/system/files/TechDocs/24593.pdf
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v4:
>  1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
> to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
> vmcb_control_area structure(Suggested by Paolo).
>  2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle
> kvm_read/write_guest_virt*())
> to make it common across both SVM/VMX(Suggested by Jim Mattson).
>  3. Took care of few other comments from Jim Mattson. Dropped
> "Reviewed-by"
> on few patches which I have changed since v3.
> 
> v3:
> 
> https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgi
> t@bmoger-ubuntu/
>  1. Addressing the comments from Jim Mattson. Follow the v2 link below
> for the context.
>  2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
> using native __set_bit, clear_bit and test_bit.
>  3. Combined all the intercepts vectors into single 32 bit array.
>  4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
> clr_exception_intercept etc. Used the generic set_intercept and
> clr_intercept where applicable.
>  5. Tested both L1 guest and l2 nested guests.
> 
> v2:
> 
> https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit
> @bmoger-ubuntu/
>   - Taken care of few comments from Jim Mattson.
>   - KVM interceptions added only when tdp is off. No interceptions
> when tdp is on.
>   - Reverted the fault priority to original order in VMX.
> 
> v1:
> 
> https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgi
> t@bmoger-ubuntu/
> 
> Babu Moger (12):
>   KVM: SVM: Introduce vmcb_set_intercept, vmcb_clr_intercept and
> vmcb_is_intercept
>   KVM: SVM: Change intercept_cr to generic intercepts
>   KVM: SVM: Change intercept_dr to generic intercepts
>   KVM: SVM: Modify intercept_exceptions to generic intercepts
>   KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
>   KVM: SVM: Add new intercept vector in vmcb_control_area
>   KVM: nSVM: Cleanup nested_state data structure
>   KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and
> is_cr_intercept
>   KVM: SVM: Remove set_exception_intercept and
> clr_exception_intercept
>   KVM: X86: Rename and move the function vmx_handle_memory_failure
> to x86.c
>   KVM: X86: Move handling of INVPCID types to x86
>   KVM:SVM: Enable INVPCID feature on AMD
> 
> 
>  arch/x86/include/asm/svm.h  |  117 +---
> ---
>  arch/x86/include/uapi/asm/svm.h |2 +
>  arch/x86/kvm/svm/nested.c   |   66 +---
>  arch/x86/kvm/svm/svm.c  |  131 ++---
> --
>  arch/x86/kvm/svm/svm.h  |   87 +-
>  arch/x86/kvm/trace.h|   21 --
>  arch/x86/kvm/vmx/nested.c   |   12 ++--
>  arch/x86/kvm/vmx/vmx.c  |   95 
>  arch/x86/kvm/vmx/vmx.h  |2 -
>  arch/x86/kvm/x86.c  |  106 
>  arch/x86/kvm/x86.h  |3 +
>  11 files changed, 364 insertions(+), 278 deletions(-)
> 
> --


[PATCH v4 12/12] KVM:SVM: Enable INVPCID feature on AMD

2020-08-06 Thread Babu Moger
The following intercept bit has been added to support VMEXIT
for INVPCID instruction:
CodeNameCause
A2h VMEXIT_INVPCID  INVPCID instruction

The following bit has been added to the VMCB layout control area
to control intercept of INVPCID:
Byte Offset Bit(s)Function
14h 2 intercept INVPCID

Enable the interceptions when the the guest is running with shadow
page table enabled and handle the tlbflush based on the invpcid
instruction type.

For the guests with nested page table (NPT) support, the INVPCID
feature works as running it natively. KVM does not need to do any
special handling in this case.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger 
---
 arch/x86/include/uapi/asm/svm.h |2 ++
 arch/x86/kvm/svm/svm.c  |   51 +++
 2 files changed, 53 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 2e8a30f06c74..522d42dfc28c 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -76,6 +76,7 @@
 #define SVM_EXIT_MWAIT_COND0x08c
 #define SVM_EXIT_XSETBV0x08d
 #define SVM_EXIT_RDPRU 0x08e
+#define SVM_EXIT_INVPCID   0x0a2
 #define SVM_EXIT_NPF   0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI   0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402
@@ -171,6 +172,7 @@
{ SVM_EXIT_MONITOR, "monitor" }, \
{ SVM_EXIT_MWAIT,   "mwait" }, \
{ SVM_EXIT_XSETBV,  "xsetbv" }, \
+   { SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3c718caa3b99..053dfd00efa1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -813,6 +813,9 @@ static __init void svm_set_cpu_caps(void)
if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
boot_cpu_has(X86_FEATURE_AMD_SSBD))
kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+
+   /* Enable INVPCID feature */
+   kvm_cpu_cap_check_and_set(X86_FEATURE_INVPCID);
 }
 
 static __init int svm_hardware_setup(void)
@@ -970,6 +973,21 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, 
u64 offset)
return svm->vmcb->control.tsc_offset;
 }
 
+static void svm_check_invpcid(struct vcpu_svm *svm)
+{
+   /*
+* Intercept INVPCID instruction only if shadow page table is
+* enabled. Interception is not required with nested page table
+* enabled.
+*/
+   if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
+   if (!npt_enabled)
+   set_intercept(svm, INTERCEPT_INVPCID);
+   else
+   clr_intercept(svm, INTERCEPT_INVPCID);
+   }
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
struct vmcb_control_area *control = >vmcb->control;
@@ -1099,6 +1117,8 @@ static void init_vmcb(struct vcpu_svm *svm)
clr_intercept(svm, INTERCEPT_PAUSE);
}
 
+   svm_check_invpcid(svm);
+
if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
@@ -2714,6 +2734,33 @@ static int mwait_interception(struct vcpu_svm *svm)
return nop_interception(svm);
 }
 
+static int invpcid_interception(struct vcpu_svm *svm)
+{
+   struct kvm_vcpu *vcpu = >vcpu;
+   unsigned long type;
+   gva_t gva;
+
+   if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   /*
+* For an INVPCID intercept:
+* EXITINFO1 provides the linear address of the memory operand.
+* EXITINFO2 provides the contents of the register operand.
+*/
+   type = svm->vmcb->control.exit_info_2;
+   gva = svm->vmcb->control.exit_info_1;
+
+   if (type > 3) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   return kvm_handle_invpcid(vcpu, type, gva);
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = cr_interception,
[SVM_EXIT_READ_CR3] = cr_interception,
@@ -2776,6 +2823,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm 
*svm) = {
[SVM_EXIT_MWAIT]= mwait_interception,
[SVM_EXIT_XSETBV]   = xse

[PATCH v4 10/12] KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c

2020-08-06 Thread Babu Moger
Handling of kvm_read/write_guest_virt*() errors can be moved to common
code. The same code can be used by both VMX and SVM.

Signed-off-by: Babu Moger 
---
 arch/x86/kvm/vmx/nested.c |   12 ++--
 arch/x86/kvm/vmx/vmx.c|   29 +
 arch/x86/kvm/vmx/vmx.h|2 --
 arch/x86/kvm/x86.c|   28 
 arch/x86/kvm/x86.h|2 ++
 5 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d4a4cec034d0..32b7d9c07645 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4640,7 +4640,7 @@ static int nested_vmx_get_vmptr(struct kvm_vcpu *vcpu, 
gpa_t *vmpointer,
 
r = kvm_read_guest_virt(vcpu, gva, vmpointer, sizeof(*vmpointer), );
if (r != X86EMUL_CONTINUE) {
-   *ret = vmx_handle_memory_failure(vcpu, r, );
+   *ret = kvm_handle_memory_failure(vcpu, r, );
return -EINVAL;
}
 
@@ -4951,7 +4951,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
/* _system ok, nested_vmx_check_permission has verified cpl=0 */
r = kvm_write_guest_virt_system(vcpu, gva, , len, );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
}
 
return nested_vmx_succeed(vcpu);
@@ -5024,7 +5024,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , len, );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
}
 
field = kvm_register_readl(vcpu, (((instr_info) >> 28) & 0xf));
@@ -5190,7 +5190,7 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
r = kvm_write_guest_virt_system(vcpu, gva, (void *)_vmptr,
sizeof(gpa_t), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
return nested_vmx_succeed(vcpu);
 }
@@ -5244,7 +5244,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
/*
 * Nested EPT roots are always held through guest_mmu,
@@ -5326,7 +5326,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
return 1;
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
if (operand.vpid >> 16)
return nested_vmx_failValid(vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 13745f2a5ecd..ff7920844702 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1597,33 +1597,6 @@ static int skip_emulated_instruction(struct kvm_vcpu 
*vcpu)
return 1;
 }
 
-/*
- * Handles kvm_read/write_guest_virt*() result and either injects #PF or 
returns
- * KVM_EXIT_INTERNAL_ERROR for cases not currently handled by KVM. Return value
- * indicates whether exit to userspace is needed.
- */
-int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
- struct x86_exception *e)
-{
-   if (r == X86EMUL_PROPAGATE_FAULT) {
-   kvm_inject_emulated_page_fault(vcpu, e);
-   return 1;
-   }
-
-   /*
-* In case kvm_read/write_guest_virt*() failed with X86EMUL_IO_NEEDED
-* while handling a VMX instruction KVM could've handled the request
-* correctly by exiting to userspace and performing I/O but there
-* doesn't seem to be a real use-case behind such requests, just return
-* KVM_EXIT_INTERNAL_ERROR for now.
-*/
-   vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-   vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-   vcpu->run->internal.ndata = 0;
-
-   return 0;
-}
-
 /*
  * Recognizes a pending MTF VM-exit and records the nested state for later
  * delivery.
@@ -5534,7 +5507,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 
r = kvm_read_guest_virt(vcpu, gva, , sizeof(operand), );
if (r != X86EMUL_CONTINUE)
-   return vmx_handle_memory_failure(vcpu, r, );
+   return kvm_handle_memory_failure(vcpu, r, );
 
if (operand.pcid >> 12 != 0) {
kvm_inject_gp(vcpu, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 639798e4a6ca..ad

  1   2   3   4   5   6   >