Re: [PATCH v2] kvm/selftests: Fix race condition with dirty_log_test

2021-04-14 Thread Andrew Jones
On Tue, Apr 13, 2021 at 05:36:41PM -0400, Peter Xu wrote:
> This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
> when the testing host is very busy.
> 
> The issue is when the vcpu thread got the dirty bit set but got preempted by
> other threads _before_ the data is written, we won't be able to see the latest
> data only until the vcpu threads do VMENTER. IOW, the guest write operation 
> and
> dirty bit set cannot guarantee atomicity. The race could look like:
> 
> main threadvcpu thread
> ======
> iteration=X
>*addr = X
>(so latest data is X)
> iteration=X+1
> ...
> iteration=X+N
>guest executes "*addr = X+N"
>  reg=READ_ONCE(iteration)=X+N
>  host page fault
>set dirty bit for page "addr"
>  (_before_ VMENTER happens...
>   so *addr is still X!)
>vcpu thread got preempted
> get dirty log
> verify data
>   detected dirty bit set, data is X
>   not X+N nor X+N-1, data too old!
> 
> This patch closes this race by allowing the main thread to give the vcpu 
> thread
> chance to do a VMENTER to complete that write operation.  It's done by adding 
> a
> vcpu loop counter (must be defined as volatile as main thread will do read
> loop), then the main thread can guarantee the vcpu got at least another 
> VMENTER
> by making sure the guest_vcpu_loops increases by 2.
> 
> Dirty ring does not need this since dirty_ring_last_page would already help
> avoid this specific race condition.
> 
> Cc: Andrew Jones 
> Cc: Paolo Bonzini 
> Cc: Vitaly Kuznetsov 
> Cc: Sean Christopherson 
> Signed-off-by: Peter Xu 
> ---
> v2:
> - drop one unnecessary check on "!matched"
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c | 53 +++-
>  1 file changed, 52 insertions(+), 1 deletion(-)
>

Reviewed-by: Andrew Jones 



Re: [PATCH] KVM: selftests: vgic_init kvm selftests fixup

2021-04-07 Thread Andrew Jones


Hi Eric,

If $SUBJECT started with 'fixup!' and otherwise matched the original
subject then this could be automagically squashed in wherever the
original commit already is.

Anyway,

Reviewed-by: Andrew Jones 

Thanks,
drew

On Wed, Apr 07, 2021 at 03:59:37PM +0200, Eric Auger wrote:
> Bring some improvements/rationalization over the first version
> of the vgic_init selftests:
> 
> - ucall_init is moved in run_cpu()
> - vcpu_args_set is not called as not needed
> - whenever a helper is supposed to succeed, call the non "_" version
> - helpers do not return -errno, instead errno is checked by the caller
> - vm_gic struct is used whenever possible, as well as vm_gic_destroy
> - _kvm_create_device takes an addition fd parameter
> 
> Signed-off-by: Eric Auger 
> Suggested-by: Andrew Jones 
> 
> ---
> 
> Applies on top of [PATCH v6 0/9] KVM/ARM: Some vgic fixes
> and init sequence KVM selftests, put on kvm-arm64/vgic-5.13
> 
> The whole patchset can be found at
> https://github.com/eauger/linux/tree/vgic_kvmselftests_v7
> ---
>  .../testing/selftests/kvm/aarch64/vgic_init.c | 275 --
>  .../testing/selftests/kvm/include/kvm_util.h  |   2 +-
>  tools/testing/selftests/kvm/lib/kvm_util.c|  30 +-
>  3 files changed, 136 insertions(+), 171 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
> b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> index be1a7c0d0527..682e753fdc59 100644
> --- a/tools/testing/selftests/kvm/aarch64/vgic_init.c
> +++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> @@ -27,7 +27,7 @@ struct vm_gic {
>   int gic_fd;
>  };
>  
> -int max_ipa_bits;
> +static int max_ipa_bits;
>  
>  /* helper to access a redistributor register */
>  static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
> @@ -51,12 +51,8 @@ static void guest_code(void)
>  /* we don't want to assert on run execution, hence that helper */
>  static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
>  {
> - int ret;
> -
> - vcpu_args_set(vm, vcpuid, 1);
> - ret = _vcpu_ioctl(vm, vcpuid, KVM_RUN, NULL);
> - get_ucall(vm, vcpuid, NULL);
> -
> + ucall_init(vm, NULL);
> + int ret = _vcpu_ioctl(vm, vcpuid, KVM_RUN, NULL);
>   if (ret)
>   return -errno;
>   return 0;
> @@ -68,7 +64,6 @@ static struct vm_gic vm_gic_create(void)
>  
>   v.vm = vm_create_default_with_vcpus(NR_VCPUS, 0, 0, guest_code, NULL);
>   v.gic_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false);
> - TEST_ASSERT(v.gic_fd > 0, "GICv3 device created");
>  
>   return v;
>  }
> @@ -91,66 +86,62 @@ static void subtest_dist_rdist(struct vm_gic *v)
>   uint64_t addr;
>  
>   /* Check existing group/attributes */
> - ret = _kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> - KVM_VGIC_V3_ADDR_TYPE_DIST);
> - TEST_ASSERT(!ret, "KVM_DEV_ARM_VGIC_GRP_ADDR/KVM_VGIC_V3_ADDR_TYPE_DIST 
> supported");
> + kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> +   KVM_VGIC_V3_ADDR_TYPE_DIST);
>  
> - ret = _kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> - KVM_VGIC_V3_ADDR_TYPE_REDIST);
> - TEST_ASSERT(!ret, 
> "KVM_DEV_ARM_VGIC_GRP_ADDR/KVM_VGIC_V3_ADDR_TYPE_REDIST supported");
> + kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> +   KVM_VGIC_V3_ADDR_TYPE_REDIST);
>  
>   /* check non existing attribute */
>   ret = _kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 0);
> - TEST_ASSERT(ret == -ENXIO, "attribute not supported");
> + TEST_ASSERT(ret && errno == ENXIO, "attribute not supported");
>  
>   /* misaligned DIST and REDIST address settings */
>   addr = 0x1000;
>   ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
>KVM_VGIC_V3_ADDR_TYPE_DIST, , true);
> - TEST_ASSERT(ret == -EINVAL, "GICv3 dist base not 64kB aligned");
> + TEST_ASSERT(ret && errno == EINVAL, "GICv3 dist base not 64kB aligned");
>  
>   ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
>KVM_VGIC_V3_ADDR_TYPE_REDIST, , true);
> - TEST_ASSERT(ret == -EINVAL, "GICv3 redist base not 64kB aligned");
> + TEST_ASSERT(ret && errno == EINVAL, "GICv3 redist base not 64kB 
> aligned");
>  
>   /* out of range address */
>   if (max_ipa_bits) {

Re: [PATCH v6 9/9] KVM: selftests: aarch64/vgic-v3 init sequence tests

2021-04-07 Thread Andrew Jones
On Wed, Apr 07, 2021 at 12:14:29PM +0200, Auger Eric wrote:
> >> +int _kvm_create_device(struct kvm_vm *vm, uint64_t type, bool test)
> >> +{
> >> +  struct kvm_create_device create_dev;
> >> +  int ret;
> >> +
> >> +  create_dev.type = type;
> >> +  create_dev.fd = -1;
> >> +  create_dev.flags = test ? KVM_CREATE_DEVICE_TEST : 0;
> >> +  ret = ioctl(vm_get_fd(vm), KVM_CREATE_DEVICE, _dev);
> >> +  if (ret == -1)
> >> +  return -errno;
> >> +  return test ? 0 : create_dev.fd;
> > 
> > Something like this belongs in the non underscore prefixed wrappers.
> I need at least to return the create_dev.fd or do you want me to add an
> extra int *fd parameter?
> What about:
> 
> if (ret < 0)
> return ret;
> return test ? 0 : create_dev.fd;

Maybe the underscore version of kvm_create_device isn't necessary. If
the non-underscore version isn't flexible enough, then just use the
ioctl directly from the test code with its own struct kvm_create_device
Being able to call ioctls directly from test code is what vm_get_fd()
is for, otherwise you could just use vm->fd.

Thanks,
drew



Re: [PATCH v6 9/9] KVM: selftests: aarch64/vgic-v3 init sequence tests

2021-04-06 Thread Andrew Jones


Hi Eric,

It looks like Marc already picked this patch up, but, FWIW, here's
a few more comments you may consider.

On Mon, Apr 05, 2021 at 06:39:41PM +0200, Eric Auger wrote:
> The tests exercise the VGIC_V3 device creation including the
> associated KVM_DEV_ARM_VGIC_GRP_ADDR group attributes:
> 
> - KVM_VGIC_V3_ADDR_TYPE_DIST/REDIST
> - KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION
> 
> Some other tests dedicate to KVM_DEV_ARM_VGIC_GRP_REDIST_REGS group
> and especially the GICR_TYPER read. The goal was to test the case
> recently fixed by commit 23bde34771f1
> ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace").
> 
> The API under test can be found at
> Documentation/virt/kvm/devices/arm-vgic-v3.rst
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v4 -> v5:
> - simplify the last bit tests given the simpler interpretation
>   of the spec
> 
> v3 -> v4:
> - update .gitignore
> - More vgic-mmio-v3.c change into the previous patch
> - rename fuzz_dist_rdist into test_dist_rdist
> - cleanup in run_vcpu and guest_code
> - max_ipa_bits is global
> - s/fuzz/subtest
> - added test_kvm_device,
> - moved ucall_init() just before the cpu run
> - use vm_create_default_with_vcpus
> - use vm_gic struct, vm_gic_create, vm_gic_destroy
> - revwrite util.c helpers to comply with the usual style
> ---
>  tools/testing/selftests/kvm/.gitignore|   1 +
>  tools/testing/selftests/kvm/Makefile  |   1 +
>  .../testing/selftests/kvm/aarch64/vgic_init.c | 585 ++
>  .../testing/selftests/kvm/include/kvm_util.h  |   9 +
>  tools/testing/selftests/kvm/lib/kvm_util.c|  77 +++
>  5 files changed, 673 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_init.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore 
> b/tools/testing/selftests/kvm/.gitignore
> index 7bd7e776c266..bb862f91f640 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  /aarch64/get-reg-list
>  /aarch64/get-reg-list-sve
> +/aarch64/vgic_init
>  /s390x/memop
>  /s390x/resets
>  /s390x/sync_regs_test
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index 67eebb53235f..2fd4801de9ca 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -78,6 +78,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
>  
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> +TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
>  TEST_GEN_PROGS_aarch64 += demand_paging_test
>  TEST_GEN_PROGS_aarch64 += dirty_log_test
>  TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
> diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
> b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> new file mode 100644
> index ..be1a7c0d0527
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> @@ -0,0 +1,585 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * vgic init sequence tests
> + *
> + * Copyright (C) 2020, Red Hat, Inc.
> + */
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +#define NR_VCPUS 4
> +
> +#define REDIST_REGION_ATTR_ADDR(count, base, flags, index) 
> (((uint64_t)(count) << 52) | \
> + ((uint64_t)((base) >> 16) << 16) | ((uint64_t)(flags) << 12) | index)
> +#define REG_OFFSET(vcpu, offset) (((uint64_t)vcpu << 32) | offset)
> +
> +#define GICR_TYPER 0x8
> +
> +struct vm_gic {
> + struct kvm_vm *vm;
> + int gic_fd;
> +};
> +
> +int max_ipa_bits;

static

> +
> +/* helper to access a redistributor register */
> +static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
> +  uint32_t *val, bool write)
> +{
> + uint64_t attr = REG_OFFSET(vcpu, offset);
> +
> + return _kvm_device_access(gicv3_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
> +   attr, val, write);
> +}
> +
> +/* dummy guest code */
> +static void guest_code(void)
> +{
> + GUEST_SYNC(0);
> + GUEST_SYNC(1);
> + GUEST_SYNC(2);
> + GUEST_DONE();
> +}
> +
> +/* we don't want to assert on run execution, hence that helper */
> +static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> + int ret;
> +
> + vcpu_args_set(vm, vcpuid, 1);

You don't need the above vcpu_args_set call since guest_code doesn't take
any arguments.

> + ret = _vcpu_ioctl(vm, vcpuid, KVM_RUN, NULL);
> + get_ucall(vm, vcpuid, NULL);

You're not checking the result of get_ucall, so there's no need for the
call.

> +
> + if (ret)
> + return -errno;
> + return 0;
> +}
> +
> +static struct vm_gic vm_gic_create(void)
> +{
> + struct vm_gic v;
> +
> + v.vm = vm_create_default_with_vcpus(NR_VCPUS, 0, 0, guest_code, NULL);
> + v.gic_fd = kvm_create_device(v.vm, 

Re: [RFC PATCH v5 10/10] KVM: selftests: Add a test for kvm page table code

2021-03-29 Thread Andrew Jones
e
they could be merged into one function and this one could be handled
differently.

> +
> + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++)
> + pthread_join(vcpu_threads[vcpu_id], NULL);
> +
> + sem_destroy_s(_stage_updated);
> + sem_destroy_s(_stage_completed);
> +
> + free(vcpu_threads);
> + ucall_uninit(vm);
> + kvm_vm_free(vm);
> +}
> +
> +static void help(char *name)
> +{
> + puts("");
> + printf("usage: %s [-h] [-p offset] [-m mode] "
> +"[-b mem-size] [-v vcpus] [-s mem-type]\n", name);
> + puts("");
> + printf(" -p: specify guest physical test memory offset\n"
> +" Warning: a low offset can conflict with the loaded test 
> code.\n");
> + guest_modes_help();
> + printf(" -b: specify size of the memory region for testing. e.g. 10M or 
> 3G.\n"
> +" (default: 1G)\n");
> + printf(" -v: specify the number of vCPUs to run\n"
> +" (default: 1)\n");
> + printf(" -s: specify the type of memory that should be used to\n"
> +" back the guest data region.\n"
> +" (default: anonymous)\n\n");
   ^ is this extra \n needed?
> + backing_src_help();
> + puts("");
> + exit(0);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
> + struct test_params p = {
> + .test_mem_size = DEFAULT_TEST_MEM_SIZE,
> + .src_type = VM_MEM_SRC_ANONYMOUS,
> + };
> + int opt;
> +
> + guest_modes_append_default();
> +
> + while ((opt = getopt(argc, argv, "hp:m:b:v:s:")) != -1) {
> + switch (opt) {
> + case 'p':
> + p.phys_offset = strtoull(optarg, NULL, 0);
> + break;
> + case 'm':
> + guest_modes_cmdline(optarg);
> + break;
> + case 'b':
> + p.test_mem_size = parse_size(optarg);
> + break;
> + case 'v':
> + nr_vcpus = atoi(optarg);
> + TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus,
> + "Invalid number of vcpus, must be between 1 
> and %d", max_vcpus);
> + break;
> + case 's':
> + p.src_type = parse_backing_src_type(optarg);
> + break;
> + case 'h':
> + default:
> + help(argv[0]);
> + break;

nit: I'd replace this break with an exit() and not exit from help().

> + }
> + }
> +
> + for_each_guest_mode(run_test, );
> +
> + return 0;
> +}
> -- 
> 2.19.1
>

My comments are mainly just a bunch of nits, so

Reviewed-by: Andrew Jones 

Thanks,
drew 



Re: [RFC PATCH v5 08/10] KVM: selftests: List all hugetlb src types specified with page sizes

2021-03-23 Thread Andrew Jones
On Tue, Mar 23, 2021 at 09:52:29PM +0800, Yanan Wang wrote:
> With VM_MEM_SRC_ANONYMOUS_HUGETLB, we currently can only use system
> default hugetlb pages to back the testing guest memory. In order to
> add flexibility, now list all the known hugetlb backing src types with
> different page sizes, so that we can specify use of hugetlb pages of the
> exact granularity that we want. And as all the known hugetlb page sizes
> are listed, it's appropriate for all architectures.
> 
> Besides, the helper get_backing_src_pagesz() is added to get the
> granularity of different backing src types(anonumous, thp, hugetlb).
> 
> Suggested-by: Ben Gardon 
> Signed-off-by: Yanan Wang 
> ---
>  .../testing/selftests/kvm/include/test_util.h |  18 ++-
>  tools/testing/selftests/kvm/lib/kvm_util.c|   2 +-
>  tools/testing/selftests/kvm/lib/test_util.c   | 109 --
>  3 files changed, 116 insertions(+), 13 deletions(-)
>

Reviewed-by: Andrew Jones 



Re: [RFC PATCH v5 06/10] KVM: selftests: Add a helper to get system configured THP page size

2021-03-23 Thread Andrew Jones
On Tue, Mar 23, 2021 at 09:52:27PM +0800, Yanan Wang wrote:
> If we want to have some tests about transparent hugepages, the system
> configured THP hugepage size should better be known by the tests, which
> can be used for kinds of alignment or guest memory accessing of vcpus...
> So it makes sense to add a helper to get the transparent hugepage size.
> 
> With VM_MEM_SRC_ANONYMOUS_THP specified in vm_userspace_mem_region_add(),
> we now stat /sys/kernel/mm/transparent_hugepage to check whether THP is
> configured in the host kernel before madvise(). Based on this, we can also
> read file /sys/kernel/mm/transparent_hugepage/hpage_pmd_size to get THP
> hugepage size.
> 
> Signed-off-by: Yanan Wang 
> Reviewed-by: Ben Gardon 
> ---
>  .../testing/selftests/kvm/include/test_util.h |  2 ++
>  tools/testing/selftests/kvm/lib/test_util.c   | 29 +++
>  2 files changed, 31 insertions(+)


Reviewed-by: Andrew Jones 



Re: [RFC PATCH v5 04/10] KVM: selftests: Print the errno besides error-string in TEST_ASSERT

2021-03-23 Thread Andrew Jones
On Tue, Mar 23, 2021 at 09:52:25PM +0800, Yanan Wang wrote:
> Print the errno besides error-string in TEST_ASSERT in the format of
> "errno=%d - %s" will explicitly indicate that the string is an error
> information. Besides, the errno is easier to be used for debugging
> than the error-string.
> 
> Suggested-by: Andrew Jones 
> Signed-off-by: Yanan Wang 
> ---
>  tools/testing/selftests/kvm/lib/assert.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/assert.c 
> b/tools/testing/selftests/kvm/lib/assert.c
> index 5ebbd0d6b472..71ade6100fd3 100644
> --- a/tools/testing/selftests/kvm/lib/assert.c
> +++ b/tools/testing/selftests/kvm/lib/assert.c
> @@ -71,9 +71,9 @@ test_assert(bool exp, const char *exp_str,
>  
>   fprintf(stderr, " Test Assertion Failure \n"
>   "  %s:%u: %s\n"
> - "  pid=%d tid=%d - %s\n",
> + "  pid=%d tid=%d errno=%d - %s\n",
>   file, line, exp_str, getpid(), _gettid(),
> - strerror(errno));
> + errno, strerror(errno));
>   test_dump_stack();
>   if (fmt) {
>   fputs("  ", stderr);
> -- 
> 2.19.1
>

Reviewed-by: Andrew Jones 



Re: [RFC PATCH v5 02/10] tools headers: Add a macro to get HUGETLB page sizes for mmap

2021-03-23 Thread Andrew Jones


$SUBJECT says "tools headers", but this is actually changing
a UAPI header and then copying the change to tools.

Thanks,
drew

On Tue, Mar 23, 2021 at 09:52:23PM +0800, Yanan Wang wrote:
> We know that if a system supports multiple hugetlb page sizes,
> the desired hugetlb page size can be specified in bits [26:31]
> of the flag arguments. The value in these 6 bits will be the
> shift of each hugetlb page size.
> 
> So add a macro to get the page size shift and then calculate the
> corresponding hugetlb page size, using flag x.
> 
> Cc: Ben Gardon 
> Cc: Ingo Molnar 
> Cc: Adrian Hunter 
> Cc: Jiri Olsa 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Arnd Bergmann 
> Cc: Michael Kerrisk 
> Cc: Thomas Gleixner 
> Suggested-by: Ben Gardon 
> Signed-off-by: Yanan Wang 
> Reviewed-by: Ben Gardon 
> ---
>  include/uapi/linux/mman.h   | 2 ++
>  tools/include/uapi/linux/mman.h | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> index f55bc680b5b0..d72df73b182d 100644
> --- a/include/uapi/linux/mman.h
> +++ b/include/uapi/linux/mman.h
> @@ -41,4 +41,6 @@
>  #define MAP_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB
>  #define MAP_HUGE_16GBHUGETLB_FLAG_ENCODE_16GB
>  
> +#define MAP_HUGE_PAGE_SIZE(x) (1ULL << ((x >> MAP_HUGE_SHIFT) & 
> MAP_HUGE_MASK))
> +
>  #endif /* _UAPI_LINUX_MMAN_H */
> diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h
> index f55bc680b5b0..d72df73b182d 100644
> --- a/tools/include/uapi/linux/mman.h
> +++ b/tools/include/uapi/linux/mman.h
> @@ -41,4 +41,6 @@
>  #define MAP_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB
>  #define MAP_HUGE_16GBHUGETLB_FLAG_ENCODE_16GB
>  
> +#define MAP_HUGE_PAGE_SIZE(x) (1ULL << ((x >> MAP_HUGE_SHIFT) & 
> MAP_HUGE_MASK))
> +
>  #endif /* _UAPI_LINUX_MMAN_H */
> -- 
> 2.19.1
> 



Re: [PATCH] selftests: kvm: make hardware_disable_test less verbose

2021-03-23 Thread Andrew Jones
On Tue, Mar 23, 2021 at 09:53:03AM +0100, Vitaly Kuznetsov wrote:
> hardware_disable_test produces 512 snippets like
> ...
>  main: [511] waiting semaphore
>  run_test: [511] start vcpus
>  run_test: [511] all threads launched
>  main: [511] waiting 368us
>  main: [511] killing child
> 
> and this doesn't have much value, let's just drop these fprintf().
> Restoring them for debugging purposes shouldn't be too hard.

Changing them to pr_debug() allows you to keep them and restore
with -DDEBUG

Thanks,
drew

> 
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  tools/testing/selftests/kvm/hardware_disable_test.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/hardware_disable_test.c 
> b/tools/testing/selftests/kvm/hardware_disable_test.c
> index 2f2eeb8a1d86..d6d4517c4a8a 100644
> --- a/tools/testing/selftests/kvm/hardware_disable_test.c
> +++ b/tools/testing/selftests/kvm/hardware_disable_test.c
> @@ -108,7 +108,6 @@ static void run_test(uint32_t run)
>   kvm_vm_elf_load(vm, program_invocation_name, 0, 0);
>   vm_create_irqchip(vm);
>  
> - fprintf(stderr, "%s: [%d] start vcpus\n", __func__, run);
>   for (i = 0; i < VCPU_NUM; ++i) {
>   vm_vcpu_add_default(vm, i, guest_code);
>   payloads[i].vm = vm;
> @@ -124,7 +123,6 @@ static void run_test(uint32_t run)
>   check_set_affinity(throw_away, _set);
>   }
>   }
> - fprintf(stderr, "%s: [%d] all threads launched\n", __func__, run);
>   sem_post(sem);
>   for (i = 0; i < VCPU_NUM; ++i)
>   check_join(threads[i], );
> @@ -147,16 +145,13 @@ int main(int argc, char **argv)
>   if (pid == 0)
>   run_test(i); /* This function always exits */
>  
> - fprintf(stderr, "%s: [%d] waiting semaphore\n", __func__, i);
>   sem_wait(sem);
>   r = (rand() % DELAY_US_MAX) + 1;
> - fprintf(stderr, "%s: [%d] waiting %dus\n", __func__, i, r);
>   usleep(r);
>   r = waitpid(pid, , WNOHANG);
>   TEST_ASSERT(r != pid,
>   "%s: [%d] child exited unexpectedly status: [%d]",
>   __func__, i, s);
> - fprintf(stderr, "%s: [%d] killing child\n", __func__, i);
>   kill(pid, SIGKILL);
>   }
>  
> -- 
> 2.30.2
> 



Re: [PATCH v3 8/8] KVM: selftests: aarch64/vgic-v3 init sequence tests

2021-03-22 Thread Andrew Jones
On Fri, Mar 12, 2021 at 06:32:02PM +0100, Eric Auger wrote:
> The tests exercise the VGIC_V3 device creation including the
> associated KVM_DEV_ARM_VGIC_GRP_ADDR group attributes:
> 
> - KVM_VGIC_V3_ADDR_TYPE_DIST/REDIST
> - KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION
> 
> Some other tests dedicate to KVM_DEV_ARM_VGIC_GRP_REDIST_REGS group
> and especially the GICR_TYPER read. The goal was to test the case
> recently fixed by commit 23bde34771f1
> ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace").
> 
> The API under test can be found at
> Documentation/virt/kvm/devices/arm-vgic-v3.rst
> 
> Signed-off-by: Eric Auger 
> ---
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c|   2 +-
>  tools/testing/selftests/kvm/Makefile  |   1 +
>  .../testing/selftests/kvm/aarch64/vgic_init.c | 672 ++
>  .../testing/selftests/kvm/include/kvm_util.h  |   5 +
>  tools/testing/selftests/kvm/lib/kvm_util.c|  51 ++
>  5 files changed, 730 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_init.c
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index 652998ed0b55..f6a7eed1d6ad 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -260,7 +260,7 @@ static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu 
> *vcpu)
>   if (!rdreg)
>   return false;
>  
> - if (!rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
> + if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {

I guess this is an accidental change?

>   /* check whether there is no other contiguous rdist region */
>   struct list_head *rd_regions = >rd_regions;
>   struct vgic_redist_region *iter;
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index a6d61f451f88..4e548d7ab0ab 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -75,6 +75,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
>  
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> +TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
>  TEST_GEN_PROGS_aarch64 += demand_paging_test
>  TEST_GEN_PROGS_aarch64 += dirty_log_test
>  TEST_GEN_PROGS_aarch64 += dirty_log_perf_test

Missing .gitignore change

> diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
> b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> new file mode 100644
> index ..12205ab9fb10
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> @@ -0,0 +1,672 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * vgic init sequence tests
> + *
> + * Copyright (C) 2020, Red Hat, Inc.
> + */
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +#define NR_VCPUS 4
> +
> +#define REDIST_REGION_ATTR_ADDR(count, base, flags, index) 
> (((uint64_t)(count) << 52) | \
> + ((uint64_t)((base) >> 16) << 16) | ((uint64_t)(flags) << 12) | index)
> +#define REG_OFFSET(vcpu, offset) (((uint64_t)vcpu << 32) | offset)
> +
> +#define GICR_TYPER 0x8
> +
> +/* helper to access a redistributor register */
> +static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
> +  uint32_t *val, bool write)
> +{
> + uint64_t attr = REG_OFFSET(vcpu, offset);
> +
> + return kvm_device_access(gicv3_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
> +  attr, val, write);
> +}
> +
> +/* dummy guest code */
> +static void guest_code(int cpu)

cpu is unused, no need for it

> +{
> + GUEST_SYNC(0);
> + GUEST_SYNC(1);
> + GUEST_SYNC(2);
> + GUEST_DONE();
> +}
> +
> +/* we don't want to assert on run execution, hence that helper */
> +static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> + static int run;
> + struct ucall uc;
> + int ret;
> +
> + vcpu_args_set(vm, vcpuid, 1, vcpuid);

The cpu index is unused, so you need to pass it in

> + ret = _vcpu_ioctl(vm, vcpuid, KVM_RUN, NULL);
> + get_ucall(vm, vcpuid, );

uc is unused, so you can pass NULL for it

> + run++;

What good is this counter? Nobody reads it.

> +
> + if (ret)
> + return -errno;
> + return 0;
> +}
> +
> +/**
> + * Helper routine that performs KVM device tests in general and
> + * especially ARM_VGIC_V3 ones. Eventually the ARM_VGIC_V3
> + * device gets created, a legacy RDIST region is set at @0x0
> + * and a DIST region is set @0x6
> + */
> +int fuzz_dist_rdist(struct kvm_vm *vm)

Missing static

> +{
> + int ret, gicv3_fd, max_ipa_bits;
> + uint64_t addr;
> +
> + max_ipa_bits = kvm_check_cap(KVM_CAP_ARM_VM_IPA_SIZE);
> +
> + /* check ARM_VGIC_V3 device exists */
> + ret = kvm_create_device(vm, KVM_DEV_TYPE_ARM_VGIC_V3, true);

Re: [PATCH v2 2/2] selftests/kvm: add set_boot_cpu_id test

2021-03-19 Thread Andrew Jones
On Fri, Mar 19, 2021 at 09:34:40AM +0100, Emanuele Giuseppe Esposito wrote:
> On 18/03/2021 17:20, Andrew Jones wrote:
> > On Thu, Mar 18, 2021 at 04:16:24PM +0100, Emanuele Giuseppe Esposito wrote:
> > > +static void run_vm_bsp(uint32_t bsp_vcpu)
> > 
> > I think the 'bsp' suffixes and prefixes make the purpose of this function
> > and its argument more confusing. Just
> > 
> >   static void run_vm(uint32_t vcpu)
> > 
> > would be more clear to me.
> 
> The idea here was "run vm with this vcpu as BSP", implicitly assuming that
> there are alwasy 2 vcpu inside, so we are picking one as BSP.
> 
> Maybe
> 
> run_vm_2_vcpu(uint32_t bsp_vcpid)
> 
> is better?

run_vm() is probably still sufficient for the function name. I better
understand the naming of the function's argument now, so the bsp prefix
does make sense.

> 
> > 
> > > +{
> > > + struct kvm_vm *vm;
> > > + bool is_bsp_vcpu1 = bsp_vcpu == VCPU_ID1;
> > 
> > Could add another define
> > 
> >   #define BSP_VCPU VCPU_ID1
> > 
> > And then instead of creating the above bool, just do
> > 
> >   if (vcpu == BSP_VCPU)
> 
> I think it will be even more confusing to have BSP_VCPU fixed to VCPU_ID1,
> because in the tests before and after I use VCPU_ID0 as BSP.
> 
>   run_vm_bsp(VCPU_ID0);
>   run_vm_bsp(VCPU_ID1);
>   run_vm_bsp(VCPU_ID0);

Agreed. I hadn't read that far down and hadn't grasped the purpose
of run_vm[_bsp]'s argument before. But, can we get rid of the
is_bsp_vcpu1 boolean anyway? 'if (bsp_vcpu == VCPU_ID1)' is terse
enough, and it better exposes the logic.

> 
> > 
> > > +
> > > + vm = create_vm();
> > > +
> > > + if (is_bsp_vcpu1)
> > > + vm_ioctl(vm, KVM_SET_BOOT_CPU_ID, (void *) VCPU_ID1);
> > 
> > Does this ioctl need to be called before creating the vcpus? The
> > documentation in Documentation/virt/kvm/api.rst doesn't say that.
> 
> Yes, it has to be called before creating the vcpus, as also shown in the
> test function "check_set_bsp_busy". KVM checks that created_vcpus is 0
> before setting the bsp field.
> 
> arch/x86/kvm/x86.c
>   case KVM_SET_BOOT_CPU_ID:
>   ...
>   if (kvm->created_vcpus)
>   r = -EBUSY;
>   else
>   kvm->arch.bsp_vcpu_id = arg;
> 
> I will update the documentation to include also this information.

Great!

And I'll try to improve the KVM selftests API to better support these
types of tests without having to copy so much code.

> 
> > If it can be called after creating the vcpus, then
> > vm_create_default_with_vcpus() can be used and there's no need
> > for the create_vm() and add_x86_vcpu() functions.
> 
> Just use the
> > same guest code for both, but pass the cpu index to the guest
> > code function allowing something like
> > 
> > if (cpu == BSP_VCPU)
> > GUEST_ASSERT(get_bsp_flag() != 0);
> > else
> > GUEST_ASSERT(get_bsp_flag() == 0);
> > 
> I might be wrong, but there seems not to be an easy way to pass arguments to
> the guest function.

There are several tests that pass the CPU index to the guest function
which you can use as examples. Take a look at e.g. steal_time.c. The
trick is to set the argument(s) with vcpu_args_set().

Thanks,
drew



Re: [PATCH] selftests/kvm: add get_msr_index_features

2021-03-18 Thread Andrew Jones
On Thu, Mar 18, 2021 at 06:33:35PM +0100, Paolo Bonzini wrote:
> On 18/03/21 18:03, Andrew Jones wrote:
> > > 
> > >  TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
> > > +TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
> > 
> > Maybe we should give up trying to keep an alphabetic order.
> 
> FWIW I had fixed that but yeah maybe we should just give up.
> 
> > > +int main(int argc, char *argv[])
> > > +{
> > > + if (kvm_check_cap(KVM_CAP_GET_MSR_FEATURES))
> > > + test_get_msr_feature();
> > > +
> > > + test_get_msr_index();
> > Missing return
> > 
> > > +}
> 
> "main" is special, it's okay not to have a return there.

Hmm, yeah. I always assumed the compiler would complain or that you'd end
up with a garbage return code. But, I just checked, and indeed not only do
you not get a warning, even with -Wall -Wextra, but the compiler actually
emits code for a zero return value on your behalf. Looks weird to me
though to end an int function without a return, so I don't think I'm
going to adopt this practice myself.

Thanks,
drew



Re: [PATCH] selftests/kvm: add get_msr_index_features

2021-03-18 Thread Andrew Jones
On Thu, Mar 18, 2021 at 03:56:29PM +0100, Emanuele Giuseppe Esposito wrote:
> Test the KVM_GET_MSR_FEATURE_INDEX_LIST
> and KVM_GET_MSR_INDEX_LIST ioctls.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  tools/testing/selftests/kvm/.gitignore|   1 +
>  tools/testing/selftests/kvm/Makefile  |   1 +
>  .../kvm/x86_64/get_msr_index_features.c   | 124 ++
>  3 files changed, 126 insertions(+)
>  create mode 100644 
> tools/testing/selftests/kvm/x86_64/get_msr_index_features.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore 
> b/tools/testing/selftests/kvm/.gitignore
> index 32b87cc77c8e..d99f3969d371 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -5,6 +5,7 @@
>  /s390x/resets
>  /s390x/sync_regs_test
>  /x86_64/cr4_cpuid_sync_test
> +/x86_64/get_msr_index_features
>  /x86_64/debug_regs
>  /x86_64/evmcs_test
>  /x86_64/get_cpuid_test
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index a6d61f451f88..c748b9650e28 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -39,6 +39,7 @@ LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
>  LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c 
> lib/s390x/diag318_test_handler.c
>  
>  TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
> +TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features

Maybe we should give up trying to keep an alphabetic order.

>  TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
>  TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
> diff --git a/tools/testing/selftests/kvm/x86_64/get_msr_index_features.c 
> b/tools/testing/selftests/kvm/x86_64/get_msr_index_features.c
> new file mode 100644
> index ..ad9972d99dfa
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/get_msr_index_features.c
> @@ -0,0 +1,124 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Test that KVM_GET_MSR_INDEX_LIST and
> + * KVM_GET_MSR_FEATURE_INDEX_LIST work as intended
> + *
> + * Copyright (C) 2020, Red Hat, Inc.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +#include "../lib/kvm_util_internal.h"

I'm not sure why the original kvm selftests authors decided to do this
internal stuff, but we should either kill that or avoid doing stuff like
this.

> +
> +static int kvm_num_index_msrs(int kvm_fd, int nmsrs)
> +{
> + struct kvm_msr_list *list;
> + int r;
> +
> + list = malloc(sizeof(*list) + nmsrs * sizeof(list->indices[0]));
> + list->nmsrs = nmsrs;
> + r = ioctl(kvm_fd, KVM_GET_MSR_INDEX_LIST, list);
> + TEST_ASSERT(r == -1 && errno == E2BIG,
> + "Unexpected result from KVM_GET_MSR_INDEX_LIST 
> probe, r: %i",
> + r);

Weird indentation

> +
> + r = list->nmsrs;
> + free(list);
> + return r;
> +}
> +
> +static void test_get_msr_index(void)
> +{
> + int old_res, res, kvm_fd;
> +
> + kvm_fd = open(KVM_DEV_PATH, O_RDONLY);
> + if (kvm_fd < 0)
> + exit(KSFT_SKIP);
> +
> + old_res = kvm_num_index_msrs(kvm_fd, 0);
> + TEST_ASSERT(old_res != 0, "Expecting nmsrs to be > 0");
> +
> + if (old_res != 1) {
> + res = kvm_num_index_msrs(kvm_fd, 1);
> + TEST_ASSERT(res > 1, "Expecting nmsrs to be > 1");
> + TEST_ASSERT(res == old_res, "Expecting nmsrs to be identical");
> + }
> +
> + close(kvm_fd);
> +}
> +
> +static int kvm_num_feature_msrs(int kvm_fd, int nmsrs)
> +{
> + struct kvm_msr_list *list;
> + int r;
> +
> + list = malloc(sizeof(*list) + nmsrs * sizeof(list->indices[0]));
> + list->nmsrs = nmsrs;
> + r = ioctl(kvm_fd, KVM_GET_MSR_FEATURE_INDEX_LIST, list);
> + TEST_ASSERT(r == -1 && errno == E2BIG,
> + "Unexpected result from KVM_GET_MSR_FEATURE_INDEX_LIST probe, 
> r: %i",
> + r);

Weird indentation. I'd just leave it up on the last line. We don't care
about long lines.

> +
> + r = list->nmsrs;
> + free(list);
> + return r;
> +}
> +
> +struct kvm_msr_list *kvm_get_msr_feature_list(int kvm_fd, int nmsrs)
> +{
> + struct kvm_msr_list *list;
> + int r;
> +
> + list = malloc(sizeof(*list) + nmsrs * sizeof(list->indices[0]));
> + list->nmsrs = nmsrs;
> + r = ioctl(kvm_fd, KVM_GET_MSR_FEATURE_INDEX_LIST, list);
> +
> + TEST_ASSERT(r == 0,
> + "Unexpected result from KVM_GET_MSR_FEATURE_INDEX_LIST, r: %i",
> + r);
> +
> + return list;
> +}
> +
> +static void test_get_msr_feature(void)
> +{
> + int res, old_res, i, kvm_fd;
> + struct kvm_msr_list *feature_list;
> +
> + kvm_fd = open(KVM_DEV_PATH, O_RDONLY);
> + if (kvm_fd < 0)
> + exit(KSFT_SKIP);
> +
> + old_res = 

Re: [PATCH v2 1/2] kvm/kvm_util: add _vm_ioctl

2021-03-18 Thread Andrew Jones
On Thu, Mar 18, 2021 at 04:16:23PM +0100, Emanuele Giuseppe Esposito wrote:
> As in kvm_ioctl and _kvm_ioctl, add
> the respective _vm_ioctl for vm_ioctl.
> 
> _vm_ioctl invokes an ioctl using the vm fd,
> leaving the caller to test the result.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  tools/testing/selftests/kvm/include/kvm_util.h | 1 +
>  tools/testing/selftests/kvm/lib/kvm_util.c | 7 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index 2d7eb6989e83..d53a5f7cad61 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -133,6 +133,7 @@ void vcpu_ioctl(struct kvm_vm *vm, uint32_t vcpuid, 
> unsigned long ioctl,
>  int _vcpu_ioctl(struct kvm_vm *vm, uint32_t vcpuid, unsigned long ioctl,
>   void *arg);
>  void vm_ioctl(struct kvm_vm *vm, unsigned long ioctl, void *arg);
> +int _vm_ioctl(struct kvm_vm *vm, unsigned long cmd, void *arg);
>  void kvm_ioctl(struct kvm_vm *vm, unsigned long ioctl, void *arg);
>  int _kvm_ioctl(struct kvm_vm *vm, unsigned long ioctl, void *arg);
>  void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t 
> flags);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index e5fbf16f725b..b8849a1aca79 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1697,11 +1697,16 @@ void vm_ioctl(struct kvm_vm *vm, unsigned long cmd, 
> void *arg)
>  {
>   int ret;
>  
> - ret = ioctl(vm->fd, cmd, arg);
> + ret = _vm_ioctl(vm, cmd, arg);
>   TEST_ASSERT(ret == 0, "vm ioctl %lu failed, rc: %i errno: %i (%s)",
>   cmd, ret, errno, strerror(errno));
>  }
>  
> +int _vm_ioctl(struct kvm_vm *vm, unsigned long cmd, void *arg)
> +{
> + return ioctl(vm->fd, cmd, arg);
> +}
> +
>  /*
>   * KVM system ioctl
>   *
> -- 
> 2.29.2
>

With the summary prefix change suggested by Paolo, or even better
'KVM: selftests:' since that's what the majority of patches in KVM
selftests have

Reviewed-by: Andrew Jones 



Re: [PATCH v2 2/2] selftests/kvm: add set_boot_cpu_id test

2021-03-18 Thread Andrew Jones
On Thu, Mar 18, 2021 at 04:16:24PM +0100, Emanuele Giuseppe Esposito wrote:
> Test for the KVM_SET_BOOT_CPU_ID ioctl.
> Check that it correctly allows to change the BSP vcpu.
> 
> v1 -> v2:
> - remove unnecessary printf
> - move stage for loop inside run_vcpu
> - test EBUSY when calling KVM_SET_BOOT_CPU_ID after vcpu
>   creation and execution
> - introduce _vm_ioctl

This information should be in the cover-letter. Or, for a single patch (no
cover-letter needed submission), then it should go below the '---' under
your s-o-b.

> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  tools/testing/selftests/kvm/.gitignore|   1 +
>  tools/testing/selftests/kvm/Makefile  |   1 +
>  .../selftests/kvm/x86_64/set_boot_cpu_id.c| 166 ++
>  3 files changed, 168 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore 
> b/tools/testing/selftests/kvm/.gitignore
> index 32b87cc77c8e..43b8aa82aefe 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -5,6 +5,7 @@
>  /s390x/resets
>  /s390x/sync_regs_test
>  /x86_64/cr4_cpuid_sync_test
> +/x86_64/set_boot_cpu_id
>  /x86_64/debug_regs
>  /x86_64/evmcs_test
>  /x86_64/get_cpuid_test
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index a6d61f451f88..e7b62666e06e 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -39,6 +39,7 @@ LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
>  LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c 
> lib/s390x/diag318_test_handler.c
>  
>  TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
> +TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id

We like the above two test lists (Makefile and .gitignore) to be in
alphabetical order.

>  TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
>  TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
> diff --git a/tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c 
> b/tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c
> new file mode 100644
> index ..12c558fc8074
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/set_boot_cpu_id.c
> @@ -0,0 +1,166 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Test that KVM_SET_BOOT_CPU_ID works as intended
> + *
> + * Copyright (C) 2020, Red Hat, Inc.
> + */
> +#define _GNU_SOURCE /* for program_invocation_name */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +#define N_VCPU 2
> +#define VCPU_ID0 0
> +#define VCPU_ID1 1
> +
> +static uint32_t get_bsp_flag(void)
> +{
> + return rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_BSP;
> +}
> +
> +static void guest_bsp_vcpu(void *arg)
> +{
> + GUEST_SYNC(1);
> +
> + GUEST_ASSERT(get_bsp_flag() != 0);
> +
> + GUEST_DONE();
> +}
> +
> +static void guest_not_bsp_vcpu(void *arg)
> +{
> + GUEST_SYNC(1);
> +
> + GUEST_ASSERT(get_bsp_flag() == 0);
> +
> + GUEST_DONE();
> +}
> +
> +static void test_set_boot_busy(struct kvm_vm *vm)
> +{
> + int res;
> +
> + res = _vm_ioctl(vm, KVM_SET_BOOT_CPU_ID, (void *) VCPU_ID0);
> + TEST_ASSERT(res == -1 && errno == EBUSY,
> + "KVM_SET_BOOT_CPU_ID set while running vm");
> +}
> +
> +static void run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> + struct ucall uc;
> + int stage;
> +
> + for (stage = 0; stage < 2; stage++) {
> +
> + vcpu_run(vm, vcpuid);
> +
> + switch (get_ucall(vm, vcpuid, )) {
> + case UCALL_SYNC:
> + TEST_ASSERT(!strcmp((const char *)uc.args[0], "hello") 
> &&
> + uc.args[1] == stage + 1,
> + "Stage %d: Unexpected register values 
> vmexit, got %lx",
> + stage + 1, (ulong)uc.args[1]);
> + test_set_boot_busy(vm);
> + break;
> + case UCALL_DONE:
> + TEST_ASSERT(stage == 1,
> + "Expected GUEST_DONE in stage 2, got 
> stage %d",
> + stage);
> + break;
> + case UCALL_ABORT:
> + TEST_ASSERT(false, "%s at %s:%ld\n\tvalues: %#lx, %#lx",
> + (const char *)uc.args[0], 
> __FILE__,
> + uc.args[1], uc.args[2], 
> uc.args[3]);
> + default:
> + TEST_ASSERT(false, "Unexpected exit: %s",
> + exit_reason_str(vcpu_state(vm, 
> vcpuid)->exit_reason));
> + }
> + }
> +}
> +
> +static struct kvm_vm *create_vm(void)
> +{
> + struct kvm_vm *vm;
> + 

Re: [PATCH] selftests/kvm: add test for KVM_GET_MSR_FEATURE_INDEX_LIST

2021-03-17 Thread Andrew Jones
On Wed, Mar 17, 2021 at 12:25:52PM +0100, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 17/03/2021 11:49, Paolo Bonzini wrote:
> > On 17/03/21 08:45, Emanuele Giuseppe Esposito wrote:
> > > +    struct kvm_msr_list features_list;
> > >   buffer.header.nmsrs = 1;
> > >   buffer.entry.index = msr_index;
> > > +    features_list.nmsrs = 1;
> > > +
> > >   kvm_fd = open(KVM_DEV_PATH, O_RDONLY);
> > >   if (kvm_fd < 0)
> > >   exit(KSFT_SKIP);
> > > +    r = ioctl(kvm_fd, KVM_GET_MSR_FEATURE_INDEX_LIST, _list);
> > > +    TEST_ASSERT(r < 0 && r != -E2BIG,
> > > "KVM_GET_MSR_FEATURE_INDEX_LIST IOCTL failed,\n"
> > > +    "  rc: %i errno: %i", r, errno);
> > 
> > Careful: because this has nsmrs == 1, you are overwriting an u32 of the
> > stack after struct kvm_msr_list.  You need to use your own struct
> > similar to what is done with "buffer.header" and "buffer.entry".
> > 
> > >   r = ioctl(kvm_fd, KVM_GET_MSRS, );
> > >   TEST_ASSERT(r == 1, "KVM_GET_MSRS IOCTL failed,\n"
> > >   "  rc: %i errno: %i", r, errno);
> > > 
> > 
> > More in general, this is not a test, but rather a library function used
> > to read a single MSR.
> > 
> > If you would like to add a test for KVM_GET_MSR_FEATURE_INDEX_LIST that
> > would be very welcome.  That would be a new executable.  Looking at the
> > logic for the ioctl, the main purpose of the test should be:
> > 
> > - check that if features_list.nmsrs is too small it will set the nmsrs
> > field and return -E2BIG.
> > 
> > - check that all MSRs returned by KVM_GET_MSR_FEATURE_INDEX_LIST can be
> > accessed with KVM_GET_MSRS
> > 
> > So something like this:
> > 
> >    set nmsrs to 0 and try the ioctl
> >    check that it returns -E2BIG and has changed nmsrs
> >    if nmsrs != 1 {
> >      set nmsrs to 1 and try the ioctl again
> >      check that it returns -E2BIG
> >    }
> >    malloc a buffer with room for struct kvm_msr_list and nmsrs indices
> >    set nmsrs in the malloc-ed buffer and try the ioctl again
> >    for each index
> >      invoke kvm_get_feature_msr to read it
> > 
> > (The test should also be skipped if KVM does not expose the
> > KVM_CAP_GET_MSR_FEATURES capability).
> 
> Thank you for the feedback, the title is indeed a little bit misleading. My
> idea in this patch was to just add an additional check to all usages of
> KVM_GET_MSRS, since KVM_GET_MSR_FEATURE_INDEX_LIST is used only to probe
> host capabilities and processor features.
> But you are right, a separate test would be better.
>

Hi Emanuele,

You might be able to get some inspiration from the aarch64/get-reg-list.c
test. The list of MSRs varies with KVM version and host processor, but
there may be a set of MSRs that does not vary with host processor and
should not be removed in later KVM versions. If that's the case, then
the !missing_regs assert concept of aarch64/get-reg-list.c may also
apply to this new test. Based on Paolo's comment, I presume at least the
!failed_get should apply. Finally, the test should do the E2BIG checks,
as Paolo states, but you may also want to create a lib function for
KVM_GET_MSR_FEATURE_INDEX_LIST, similar to vcpu_get_reg_list(), if you
think it may be of use to other tests.

Thanks,
drew





Re: [RFC PATCH v4 9/9] KVM: selftests: Add a test for kvm page table code

2021-03-12 Thread Andrew Jones
On Tue, Mar 02, 2021 at 08:57:51PM +0800, Yanan Wang wrote:
> This test serves as a performance tester and a bug reproducer for
> kvm page table code (GPA->HPA mappings), so it gives guidance for
> people trying to make some improvement for kvm.
> 
> The function guest_code() can cover the conditions where a single vcpu or
> multiple vcpus access guest pages within the same memory region, in three
> VM stages(before dirty logging, during dirty logging, after dirty logging).
> Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
> memory region can be specified by users, which means normal page mappings
> or block mappings can be chosen by users to be created in the test.
> 
> If ANONYMOUS memory is specified, kvm will create normal page mappings
> for the tested memory region before dirty logging, and update attributes
> of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
> memory is specified, kvm will create block mappings for the tested memory
> region before dirty logging, and split the blcok mappings into normal page
> mappings during dirty logging, and coalesce the page mappings back into
> block mappings after dirty logging is stopped.
> 
> So in summary, as a performance tester, this test can present the
> performance of kvm creating/updating normal page mappings, or the
> performance of kvm creating/splitting/recovering block mappings,
> through execution time.
> 
> When we need to coalesce the page mappings back to block mappings after
> dirty logging is stopped, we have to firstly invalidate *all* the TLB
> entries for the page mappings right before installation of the block entry,
> because a TLB conflict abort error could occur if we can't invalidate the
> TLB entries fully. We have hit this TLB conflict twice on aarch64 software
> implementation and fixed it. As this test can imulate process from dirty
> logging enabled to dirty logging stopped of a VM with block mappings,
> so it can also reproduce this TLB conflict abort due to inadequate TLB
> invalidation when coalescing tables.
> 
> Signed-off-by: Yanan Wang 
> Reviewed-by: Ben Gardon 
> ---
>  tools/testing/selftests/kvm/Makefile  |   3 +
>  .../selftests/kvm/kvm_page_table_test.c   | 476 ++
>  2 files changed, 479 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index a6d61f451f88..bac81924166d 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -67,6 +67,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test
>  TEST_GEN_PROGS_x86_64 += demand_paging_test
>  TEST_GEN_PROGS_x86_64 += dirty_log_test
>  TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
> +TEST_GEN_PROGS_x86_64 += kvm_page_table_test
>  TEST_GEN_PROGS_x86_64 += hardware_disable_test
>  TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
>  TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> @@ -78,6 +79,7 @@ TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
>  TEST_GEN_PROGS_aarch64 += demand_paging_test
>  TEST_GEN_PROGS_aarch64 += dirty_log_test
>  TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
> +TEST_GEN_PROGS_aarch64 += kvm_page_table_test
>  TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
>  TEST_GEN_PROGS_aarch64 += set_memory_region_test
>  TEST_GEN_PROGS_aarch64 += steal_time
> @@ -87,6 +89,7 @@ TEST_GEN_PROGS_s390x += s390x/resets
>  TEST_GEN_PROGS_s390x += s390x/sync_regs_test
>  TEST_GEN_PROGS_s390x += demand_paging_test
>  TEST_GEN_PROGS_s390x += dirty_log_test
> +TEST_GEN_PROGS_s390x += kvm_page_table_test
>  TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
>  TEST_GEN_PROGS_s390x += set_memory_region_test

Please add these three lines in alphabetic order. Also we're missing
the .gitignore entry.

>  
> diff --git a/tools/testing/selftests/kvm/kvm_page_table_test.c 
> b/tools/testing/selftests/kvm/kvm_page_table_test.c
> new file mode 100644
> index ..032b49d1483b
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/kvm_page_table_test.c
> @@ -0,0 +1,476 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KVM page table test
> + *
> + * Copyright (C) 2021, Huawei, Inc.
> + *
> + * Make sure that THP has been enabled or enough HUGETLB pages with specific
> + * page size have been pre-allocated on your system, if you are planning to
> + * use hugepages to back the guest memory for testing.
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_name */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +#include "guest_modes.h"
> +
> +#define TEST_MEM_SLOT_INDEX 1
> +
> +/* Default size(1GB) of the memory for testing */
> +#define DEFAULT_TEST_MEM_SIZE(1 << 30)
> +
> +/* Default guest test virtual memory offset */
> +#define DEFAULT_GUEST_TEST_MEM   0xc000
> +
> +/* Number 

Re: [RFC PATCH v4 7/9] KVM: selftests: List all hugetlb src types specified with page sizes

2021-03-12 Thread Andrew Jones
On Tue, Mar 02, 2021 at 08:57:49PM +0800, Yanan Wang wrote:
> With VM_MEM_SRC_ANONYMOUS_HUGETLB, we currently can only use system
> default hugetlb pages to back the testing guest memory. In order to
> add flexibility, now list all the known hugetlb backing src types with
> different page sizes, so that we can specify use of hugetlb pages of the
> exact granularity that we want. And as all the known hugetlb page sizes
> are listed, it's appropriate for all architectures.
> 
> Besides, the helper get_backing_src_pagesz() is added to get the
> granularity of different backing src types(anonumous, thp, hugetlb).
> 
> Suggested-by: Ben Gardon 
> Signed-off-by: Yanan Wang 
> ---
>  .../testing/selftests/kvm/include/test_util.h | 18 +-
>  tools/testing/selftests/kvm/lib/kvm_util.c|  2 +-
>  tools/testing/selftests/kvm/lib/test_util.c   | 59 +++
>  3 files changed, 66 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/test_util.h 
> b/tools/testing/selftests/kvm/include/test_util.h
> index e087174eefe5..fade3130eb01 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -71,16 +71,32 @@ enum vm_mem_backing_src_type {
>   VM_MEM_SRC_ANONYMOUS,
>   VM_MEM_SRC_ANONYMOUS_THP,
>   VM_MEM_SRC_ANONYMOUS_HUGETLB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_16KB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_64KB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_512KB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_1MB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_8MB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_16MB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_32MB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_256MB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_512MB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB,
> + VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
> + NUM_SRC_TYPES,
>  };
>  
>  struct vm_mem_backing_src_alias {
>   const char *name;
> - enum vm_mem_backing_src_type type;
> + uint32_t flag;
>  };
>  
>  bool thp_configured(void);
>  size_t get_trans_hugepagesz(void);
>  size_t get_def_hugetlb_pagesz(void);
> +const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i);
> +size_t get_backing_src_pagesz(uint32_t i);
>  void backing_src_help(void);
>  enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name);
>  
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index cc22c4ab7d67..b91c8e3a7ee1 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -757,7 +757,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>   region->mmap_start = mmap(NULL, region->mmap_size,
> PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS
> -   | (src_type == VM_MEM_SRC_ANONYMOUS_HUGETLB ? 
> MAP_HUGETLB : 0),
> +   | vm_mem_backing_src_alias(src_type)->flag,
> -1, 0);
>   TEST_ASSERT(region->mmap_start != MAP_FAILED,
>   "test_malloc failed, mmap_start: %p errno: %i",
> diff --git a/tools/testing/selftests/kvm/lib/test_util.c 
> b/tools/testing/selftests/kvm/lib/test_util.c
> index 80d68dbd72d2..df8a42eff1f8 100644
> --- a/tools/testing/selftests/kvm/lib/test_util.c
> +++ b/tools/testing/selftests/kvm/lib/test_util.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "linux/kernel.h"
>  
>  #include "test_util.h"
> @@ -112,12 +113,6 @@ void print_skip(const char *fmt, ...)
>   puts(", skipping test");
>  }
>  
> -const struct vm_mem_backing_src_alias backing_src_aliases[] = {
> - {"anonymous", VM_MEM_SRC_ANONYMOUS,},
> - {"anonymous_thp", VM_MEM_SRC_ANONYMOUS_THP,},
> - {"anonymous_hugetlb", VM_MEM_SRC_ANONYMOUS_HUGETLB,},
> -};
> -
>  bool thp_configured(void)
>  {
>   int ret;
> @@ -180,22 +175,64 @@ size_t get_def_hugetlb_pagesz(void)
>   return 0;
>  }
>  
> +const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
> +{
> + static const struct vm_mem_backing_src_alias aliases[] = {
> + { "anonymous",   0},
> + { "anonymous_thp",   0},
> + { "anonymous_hugetlb",   MAP_HUGETLB  },
> + { "anonymous_hugetlb_16kb",  MAP_HUGETLB | MAP_HUGE_16KB  },
> + { "anonymous_hugetlb_64kb",  MAP_HUGETLB | MAP_HUGE_64KB  },
> + { "anonymous_hugetlb_512kb", MAP_HUGETLB | MAP_HUGE_512KB },
> + { "anonymous_hugetlb_1mb",   MAP_HUGETLB | MAP_HUGE_1MB   },
> + { "anonymous_hugetlb_2mb",   MAP_HUGETLB | MAP_HUGE_2MB   },
> + { "anonymous_hugetlb_8mb",   MAP_HUGETLB | MAP_HUGE_8MB   },
> +  

Re: [RFC PATCH v4 8/9] KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers

2021-03-12 Thread Andrew Jones
On Tue, Mar 02, 2021 at 08:57:50PM +0800, Yanan Wang wrote:
> With VM_MEM_SRC_ANONYMOUS_THP specified in vm_userspace_mem_region_add(),
> we have to get the transparent hugepage size for HVA alignment. With the
> new helpers, we can use get_backing_src_pagesz() to check whether THP is
> configured and then get the exact configured hugepage size.
> 
> As different architectures may have different THP page sizes configured,
> this can get the accurate THP page sizes on any platform.
> 
> Signed-off-by: Yanan Wang 
> Reviewed-by: Ben Gardon 
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 28 +++---
>  1 file changed, 9 insertions(+), 19 deletions(-)
>

Reviewed-by: Andrew Jones 



Re: [RFC PATCH v4 7/9] KVM: selftests: List all hugetlb src types specified with page sizes

2021-03-12 Thread Andrew Jones
 "anonymous_hugetlb_16kb",  MAP_HUGETLB | MAP_HUGE_16KB  },
> + { "anonymous_hugetlb_64kb",  MAP_HUGETLB | MAP_HUGE_64KB  },
> + { "anonymous_hugetlb_512kb", MAP_HUGETLB | MAP_HUGE_512KB },
> + { "anonymous_hugetlb_1mb",   MAP_HUGETLB | MAP_HUGE_1MB   },
> + { "anonymous_hugetlb_2mb",   MAP_HUGETLB | MAP_HUGE_2MB   },
> + { "anonymous_hugetlb_8mb",   MAP_HUGETLB | MAP_HUGE_8MB   },
> + { "anonymous_hugetlb_16mb",  MAP_HUGETLB | MAP_HUGE_16MB  },
> + { "anonymous_hugetlb_32mb",  MAP_HUGETLB | MAP_HUGE_32MB  },
> + { "anonymous_hugetlb_256mb", MAP_HUGETLB | MAP_HUGE_256MB },
> + { "anonymous_hugetlb_512mb", MAP_HUGETLB | MAP_HUGE_512MB },
> + { "anonymous_hugetlb_1gb",   MAP_HUGETLB | MAP_HUGE_1GB   },
> + { "anonymous_hugetlb_2gb",   MAP_HUGETLB | MAP_HUGE_2GB   },
> + { "anonymous_hugetlb_16gb",  MAP_HUGETLB | MAP_HUGE_16GB  },
> + };
> + _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
> +"Missing new backing src types?");
> +
> + TEST_ASSERT(i < NUM_SRC_TYPES, "Backing src type ID %d too big", i);
> +
> + return [i];
> +}
> +
> +size_t get_backing_src_pagesz(uint32_t i)
> +{
> + uint32_t flag = vm_mem_backing_src_alias(i)->flag;
> +
> + if (i == VM_MEM_SRC_ANONYMOUS)
> + return getpagesize();
> + if (i == VM_MEM_SRC_ANONYMOUS_THP)
> + return get_trans_hugepagesz();
> + if (i == VM_MEM_SRC_ANONYMOUS_HUGETLB)
> + return get_def_hugetlb_pagesz();

nit: a switch would look nicer (IMHO)

> +
> + return MAP_HUGE_PAGE_SIZE(flag);
> +}
> +
>  void backing_src_help(void)
>  {
>   int i;
>  
>   printf("Available backing src types:\n");
> - for (i = 0; i < ARRAY_SIZE(backing_src_aliases); i++)
> - printf("\t%s\n", backing_src_aliases[i].name);
> + for (i = 0; i < NUM_SRC_TYPES; i++)
> + printf("\t%s\n", vm_mem_backing_src_alias(i)->name);

What happened with the indentation here?

>  }
>  
>  enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name)
>  {
>   int i;
>  
> - for (i = 0; i < ARRAY_SIZE(backing_src_aliases); i++)
> - if (!strcmp(type_name, backing_src_aliases[i].name))
> - return backing_src_aliases[i].type;
> + for (i = 0; i < NUM_SRC_TYPES; i++)
> + if (!strcmp(type_name, vm_mem_backing_src_alias(i)->name))
> + return i;
>  
>   backing_src_help();
>   TEST_FAIL("Unknown backing src type: %s", type_name);
> -- 
> 2.23.0
>

Otherwise

Reviewed-by: Andrew Jones 



Re: [RFC PATCH v4 6/9] KVM: selftests: Add a helper to get system default hugetlb page size

2021-03-12 Thread Andrew Jones
On Tue, Mar 02, 2021 at 08:57:48PM +0800, Yanan Wang wrote:
> If HUGETLB is configured in the host kernel, then we can know the system
> default hugetlb page size through *cat /proc/meminfo*. Otherwise, we will
> not see the information of hugetlb pages in file /proc/meminfo if it's not
> configured. So add a helper to determine whether HUGETLB is configured and
> then get the default page size by reading /proc/meminfo.
> 
> This helper can be useful when a program wants to use the default hugetlb
> pages of the system and doesn't know the default page size.
> 
> Signed-off-by: Yanan Wang 
> ---
>  .../testing/selftests/kvm/include/test_util.h |  1 +
>  tools/testing/selftests/kvm/lib/test_util.c   | 27 +++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/test_util.h 
> b/tools/testing/selftests/kvm/include/test_util.h
> index ef24c76ba89a..e087174eefe5 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -80,6 +80,7 @@ struct vm_mem_backing_src_alias {
>  
>  bool thp_configured(void);
>  size_t get_trans_hugepagesz(void);
> +size_t get_def_hugetlb_pagesz(void);
>  void backing_src_help(void);
>  enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name);
>  
> diff --git a/tools/testing/selftests/kvm/lib/test_util.c 
> b/tools/testing/selftests/kvm/lib/test_util.c
> index f2d133f76c67..80d68dbd72d2 100644
> --- a/tools/testing/selftests/kvm/lib/test_util.c
> +++ b/tools/testing/selftests/kvm/lib/test_util.c
> @@ -153,6 +153,33 @@ size_t get_trans_hugepagesz(void)
>   return size;
>  }
>  
> +size_t get_def_hugetlb_pagesz(void)
> +{
> + char buf[64];
> + const char *tag = "Hugepagesize:";
> + FILE *f;
> +
> + f = fopen("/proc/meminfo", "r");
> + TEST_ASSERT(f != NULL, "Error in opening /proc/meminfo: %d", errno);
> +
> + while (fgets(buf, sizeof(buf), f) != NULL) {
> + if (strstr(buf, tag) == buf) {
> + fclose(f);
> + return strtoull(buf + strlen(tag), NULL, 10) << 10;
> + }
> + }
> +
> + if (feof(f)) {
> + fclose(f);
> + TEST_FAIL("HUGETLB is not configured in host kernel");
> + } else {
> + fclose(f);
> + TEST_FAIL("Error in reading /proc/meminfo: %d", errno);
> + }

fclose() can be factored out.

> +
> + return 0;
> +}
> +
>  void backing_src_help(void)
>  {
>   int i;
> -- 
> 2.23.0
>

Besides the fclose comment and the same errno comment as the previous
patch

Reviewed-by: Andrew Jones 



Re: [RFC PATCH v4 5/9] KVM: selftests: Add a helper to get system configured THP page size

2021-03-12 Thread Andrew Jones
On Tue, Mar 02, 2021 at 08:57:47PM +0800, Yanan Wang wrote:
> If we want to have some tests about transparent hugepages, the system
> configured THP hugepage size should better be known by the tests, which
> can be used for kinds of alignment or guest memory accessing of vcpus...
> So it makes sense to add a helper to get the transparent hugepage size.
> 
> With VM_MEM_SRC_ANONYMOUS_THP specified in vm_userspace_mem_region_add(),
> we now stat /sys/kernel/mm/transparent_hugepage to check whether THP is
> configured in the host kernel before madvise(). Based on this, we can also
> read file /sys/kernel/mm/transparent_hugepage/hpage_pmd_size to get THP
> hugepage size.
> 
> Signed-off-by: Yanan Wang 
> Reviewed-by: Ben Gardon 
> ---
>  .../testing/selftests/kvm/include/test_util.h |  2 ++
>  tools/testing/selftests/kvm/lib/test_util.c   | 36 +++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/test_util.h 
> b/tools/testing/selftests/kvm/include/test_util.h
> index b7f41399f22c..ef24c76ba89a 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -78,6 +78,8 @@ struct vm_mem_backing_src_alias {
>   enum vm_mem_backing_src_type type;
>  };
>  
> +bool thp_configured(void);
> +size_t get_trans_hugepagesz(void);
>  void backing_src_help(void);
>  enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name);
>  
> diff --git a/tools/testing/selftests/kvm/lib/test_util.c 
> b/tools/testing/selftests/kvm/lib/test_util.c
> index c7c0627c6842..f2d133f76c67 100644
> --- a/tools/testing/selftests/kvm/lib/test_util.c
> +++ b/tools/testing/selftests/kvm/lib/test_util.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "linux/kernel.h"
>  
>  #include "test_util.h"
> @@ -117,6 +118,41 @@ const struct vm_mem_backing_src_alias 
> backing_src_aliases[] = {
>   {"anonymous_hugetlb", VM_MEM_SRC_ANONYMOUS_HUGETLB,},
>  };
>  
> +bool thp_configured(void)
> +{
> + int ret;
> + struct stat statbuf;
> +
> + ret = stat("/sys/kernel/mm/transparent_hugepage", );
> + TEST_ASSERT(ret == 0 || (ret == -1 && errno == ENOENT),
> + "Error in stating /sys/kernel/mm/transparent_hugepage: %d",
> + errno);

TEST_ASSERT will already output errno's string. Is that not sufficient? If
not, I think extending TEST_ASSERT to output errno too would be fine.

> +
> + return ret == 0;
> +}
> +
> +size_t get_trans_hugepagesz(void)
> +{
> + size_t size;
> + char buf[16];
> + FILE *f;
> +
> + TEST_ASSERT(thp_configured(), "THP is not configured in host kernel");
> +
> + f = fopen("/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", "r");
> + TEST_ASSERT(f != NULL,
> + "Error in opening transparent_hugepage/hpage_pmd_size: %d",
> + errno);

Same comment as above.

> +
> + if (fread(buf, sizeof(char), sizeof(buf), f) == 0) {
> + fclose(f);
> + TEST_FAIL("Unable to read transparent_hugepage/hpage_pmd_size");
> + }
> +
> + size = strtoull(buf, NULL, 10);

fscanf with %lld?

> + return size;
> +}
> +
>  void backing_src_help(void)
>  {
>   int i;
> -- 
> 2.23.0
> 

Thanks,
drew



Re: [RFC PATCH v4 3/9] KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing

2021-03-12 Thread Andrew Jones
On Tue, Mar 02, 2021 at 08:57:45PM +0800, Yanan Wang wrote:
> In addition to function of CLOCK_MONOTONIC, flag CLOCK_MONOTONIC_RAW can
> also shield possiable impact of NTP, which can provide more robustness.
> 
> Suggested-by: Vitaly Kuznetsov 
> Signed-off-by: Yanan Wang 
> Reviewed-by: Ben Gardon 
> ---
>  tools/testing/selftests/kvm/demand_paging_test.c  |  8 
>  tools/testing/selftests/kvm/dirty_log_perf_test.c | 14 +++---
>  tools/testing/selftests/kvm/lib/test_util.c   |  2 +-
>  tools/testing/selftests/kvm/steal_time.c  |  4 ++--
>  4 files changed, 14 insertions(+), 14 deletions(-)
>

Reviewed-by: Andrew Jones 



Re: [RFC PATCH v4 2/9] tools headers: Add a macro to get HUGETLB page sizes for mmap

2021-03-12 Thread Andrew Jones
On Tue, Mar 02, 2021 at 08:57:44PM +0800, Yanan Wang wrote:
> We know that if a system supports multiple hugetlb page sizes,
> the desired hugetlb page size can be specified in bits [26:31]
> of the flag arguments. The value in these 6 bits will be the
> shift of each hugetlb page size.
> 
> So add a macro to get the page size shift and then calculate the
> corresponding hugetlb page size, using flag x.
> 
> Cc: Ben Gardon 
> Cc: Ingo Molnar 
> Cc: Adrian Hunter 
> Cc: Jiri Olsa 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Arnd Bergmann 
> Cc: Michael Kerrisk 
> Cc: Thomas Gleixner 
> Suggested-by: Ben Gardon 
> Signed-off-by: Yanan Wang 
> Reviewed-by: Ben Gardon 
> ---
>  include/uapi/linux/mman.h   | 2 ++
>  tools/include/uapi/linux/mman.h | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> index f55bc680b5b0..8bd41128a0ee 100644
> --- a/include/uapi/linux/mman.h
> +++ b/include/uapi/linux/mman.h
> @@ -41,4 +41,6 @@
>  #define MAP_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB
>  #define MAP_HUGE_16GBHUGETLB_FLAG_ENCODE_16GB
>  
> +#define MAP_HUGE_PAGE_SIZE(x) (1 << ((x >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK))

Needs to be '1ULL' to avoid shift overflow when given MAP_HUGE_16GB.

Thanks,
drew

> +
>  #endif /* _UAPI_LINUX_MMAN_H */
> diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h
> index f55bc680b5b0..8bd41128a0ee 100644
> --- a/tools/include/uapi/linux/mman.h
> +++ b/tools/include/uapi/linux/mman.h
> @@ -41,4 +41,6 @@
>  #define MAP_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB
>  #define MAP_HUGE_16GBHUGETLB_FLAG_ENCODE_16GB
>  
> +#define MAP_HUGE_PAGE_SIZE(x) (1 << ((x >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK))
> +
>  #endif /* _UAPI_LINUX_MMAN_H */
> -- 
> 2.23.0
> 



Re: [RFC PATCH v2 3/7] KVM: selftests: Make a generic helper to get vm guest mode strings

2021-02-25 Thread Andrew Jones
On Thu, Feb 25, 2021 at 01:59:36PM +0800, Yanan Wang wrote:
> For generality and conciseness, make an API which can be used in all
> kvm libs and selftests to get vm guest mode strings. And the index i
> is checked in the API in case of possiable faults.
> 
> Signed-off-by: Yanan Wang 
> ---
>  .../testing/selftests/kvm/include/kvm_util.h  |  4 +--
>  tools/testing/selftests/kvm/lib/kvm_util.c| 29 ---
>  2 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index 2d7eb6989e83..f52a7492f47f 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -68,9 +68,6 @@ enum vm_guest_mode {
>  #define MIN_PAGE_SIZE(1U << MIN_PAGE_SHIFT)
>  #define PTES_PER_MIN_PAGEptes_per_page(MIN_PAGE_SIZE)
>  
> -#define vm_guest_mode_string(m) vm_guest_mode_string[m]
> -extern const char * const vm_guest_mode_string[];
> -
>  struct vm_guest_mode_params {
>   unsigned int pa_bits;
>   unsigned int va_bits;
> @@ -84,6 +81,7 @@ int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap 
> *cap);
>  int vcpu_enable_cap(struct kvm_vm *vm, uint32_t vcpu_id,
>   struct kvm_enable_cap *cap);
>  void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size);
> +const char *vm_guest_mode_string(uint32_t i);
>  
>  struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int 
> perm);
>  void kvm_vm_free(struct kvm_vm *vmp);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index d787cb802b4a..cc22c4ab7d67 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -141,17 +141,24 @@ static void vm_open(struct kvm_vm *vm, int perm)
>   "rc: %i errno: %i", vm->fd, errno);
>  }
>  
> -const char * const vm_guest_mode_string[] = {
> - "PA-bits:52,  VA-bits:48,  4K pages",
> - "PA-bits:52,  VA-bits:48, 64K pages",
> - "PA-bits:48,  VA-bits:48,  4K pages",
> - "PA-bits:48,  VA-bits:48, 64K pages",
> - "PA-bits:40,  VA-bits:48,  4K pages",
> - "PA-bits:40,  VA-bits:48, 64K pages",
> - "PA-bits:ANY, VA-bits:48,  4K pages",
> -};
> -_Static_assert(sizeof(vm_guest_mode_string)/sizeof(char *) == NUM_VM_MODES,
> -"Missing new mode strings?");
> +const char *vm_guest_mode_string(uint32_t i)
> +{
> + static const char * const strings[] = {
> + [VM_MODE_P52V48_4K] = "PA-bits:52,  VA-bits:48,  4K pages",
> + [VM_MODE_P52V48_64K]= "PA-bits:52,  VA-bits:48, 64K pages",
> + [VM_MODE_P48V48_4K] = "PA-bits:48,  VA-bits:48,  4K pages",
> + [VM_MODE_P48V48_64K]= "PA-bits:48,  VA-bits:48, 64K pages",
> + [VM_MODE_P40V48_4K] = "PA-bits:40,  VA-bits:48,  4K pages",
> + [VM_MODE_P40V48_64K]= "PA-bits:40,  VA-bits:48, 64K pages",
> + [VM_MODE_PXXV48_4K] = "PA-bits:ANY, VA-bits:48,  4K pages",
> + };
> + _Static_assert(sizeof(strings)/sizeof(char *) == NUM_VM_MODES,
> +"Missing new mode strings?");
> +
> + TEST_ASSERT(i < NUM_VM_MODES, "Guest mode ID %d too big", i);
> +
> + return strings[i];
> +}
>  
>  const struct vm_guest_mode_params vm_guest_mode_params[] = {
>   { 52, 48,  0x1000, 12 },
> -- 
> 2.19.1
>

Reviewed-by: Andrew Jones 



Re: [RFC PATCH v2 2/7] KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing

2021-02-25 Thread Andrew Jones
On Thu, Feb 25, 2021 at 01:59:35PM +0800, Yanan Wang wrote:
> In addition to function of CLOCK_MONOTONIC, flag CLOCK_MONOTONIC_RAW can
> also shield possiable impact of NTP, which can provide more robustness.

IIRC, this should include

Suggested-by: Vitaly Kuznetsov 

> 
> Signed-off-by: Yanan Wang 
> ---
>  tools/testing/selftests/kvm/demand_paging_test.c  |  8 
>  tools/testing/selftests/kvm/dirty_log_perf_test.c | 14 +++---
>  tools/testing/selftests/kvm/lib/test_util.c   |  2 +-
>  tools/testing/selftests/kvm/steal_time.c  |  4 ++--
>  4 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c 
> b/tools/testing/selftests/kvm/demand_paging_test.c
> index 5f7a229c3af1..efbf0c1e9130 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -53,7 +53,7 @@ static void *vcpu_worker(void *data)
>   vcpu_args_set(vm, vcpu_id, 1, vcpu_id);
>   run = vcpu_state(vm, vcpu_id);
>  
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>  
>   /* Let the guest access its memory */
>   ret = _vcpu_run(vm, vcpu_id);
> @@ -86,7 +86,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
>   copy.len = perf_test_args.host_page_size;
>   copy.mode = 0;
>  
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>  
>   r = ioctl(uffd, UFFDIO_COPY, );
>   if (r == -1) {
> @@ -123,7 +123,7 @@ static void *uffd_handler_thread_fn(void *arg)
>   struct timespec start;
>   struct timespec ts_diff;
>  
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>   while (!quit_uffd_thread) {
>   struct uffd_msg msg;
>   struct pollfd pollfd[2];
> @@ -336,7 +336,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>  
>   pr_info("Finished creating vCPUs and starting uffd threads\n");
>  
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>  
>   for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
>   pthread_create(_threads[vcpu_id], NULL, vcpu_worker,
> diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c 
> b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> index 04a2641261be..6cff4ccf9525 100644
> --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> @@ -50,7 +50,7 @@ static void *vcpu_worker(void *data)
>   while (!READ_ONCE(host_quit)) {
>   int current_iteration = READ_ONCE(iteration);
>  
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>   ret = _vcpu_run(vm, vcpu_id);
>   ts_diff = timespec_elapsed(start);
>  
> @@ -141,7 +141,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   iteration = 0;
>   host_quit = false;
>  
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>   for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
>   vcpu_last_completed_iteration[vcpu_id] = -1;
>  
> @@ -162,7 +162,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   ts_diff.tv_sec, ts_diff.tv_nsec);
>  
>   /* Enable dirty logging */
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>   vm_mem_region_set_flags(vm, PERF_TEST_MEM_SLOT_INDEX,
>   KVM_MEM_LOG_DIRTY_PAGES);
>   ts_diff = timespec_elapsed(start);
> @@ -174,7 +174,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>* Incrementing the iteration number will start the vCPUs
>* dirtying memory again.
>*/
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>   iteration++;
>  
>   pr_debug("Starting iteration %d\n", iteration);
> @@ -189,7 +189,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   pr_info("Iteration %d dirty memory time: %ld.%.9lds\n",
>   iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
>  
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>   kvm_vm_get_dirty_log(vm, PERF_TEST_MEM_SLOT_INDEX, bmap);
>  
>   ts_diff = timespec_elapsed(start);
> @@ -199,7 +199,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
>  
>   if (dirty_log_manual_caps) {
> - clock_gettime(CLOCK_MONOTONIC, );
> + clock_gettime(CLOCK_MONOTONIC_RAW, );
>   kvm_vm_clear_dirty_log(vm, PERF_TEST_MEM_SLOT_INDEX, 
> bmap, 0,
>  

Re: [PATCH 00/15] VM: selftests: Hugepage fixes and cleanups

2021-02-11 Thread Andrew Jones
On Wed, Feb 10, 2021 at 03:06:10PM -0800, Sean Christopherson wrote:
> Fix hugepage bugs in the KVM selftests that specifically affect dirty
> logging and demand paging tests.  Found while attempting to verify KVM
> changes/fixes related to hugepages and dirty logging (patches incoming in
> a separate series).
> 
> Clean up the perf_test_args util on top of the hugepage fixes to clarify
> what "page size" means, and to improve confidence in the code doing what
> it thinks it's doing.  In a few cases, users of perf_test_args were
> duplicating (approximating?) calculations made by perf_test_args, and it
> wasn't obvious that both pieces of code were guaranteed to end up with the
> same result.
> 
> Sean Christopherson (15):
>   KVM: selftests: Explicitly state indicies for vm_guest_mode_params
> array
>   KVM: selftests: Expose align() helpers to tests
>   KVM: selftests: Align HVA for HugeTLB-backed memslots
>   KVM: selftests: Force stronger HVA alignment (1gb) for hugepages
>   KVM: selftests: Require GPA to be aligned when backed by hugepages
>   KVM: selftests: Use shorthand local var to access struct
> perf_tests_args
>   KVM: selftests: Capture per-vCPU GPA in perf_test_vcpu_args
>   KVM: selftests: Use perf util's per-vCPU GPA/pages in demand paging
> test
>   KVM: selftests: Move per-VM GPA into perf_test_args
>   KVM: selftests: Remove perf_test_args.host_page_size
>   KVM: selftests: Create VM with adjusted number of guest pages for perf
> tests
>   KVM: selftests: Fill per-vCPU struct during "perf_test" VM creation
>   KVM: selftests: Sync perf_test_args to guest during VM creation
>   KVM: selftests: Track size of per-VM memslot in perf_test_args
>   KVM: selftests: Get rid of gorilla math in memslots modification test
> 
>  .../selftests/kvm/demand_paging_test.c|  39 ++---
>  .../selftests/kvm/dirty_log_perf_test.c   |  10 +-
>  .../testing/selftests/kvm/include/kvm_util.h  |  28 
>  .../selftests/kvm/include/perf_test_util.h|  18 +--
>  tools/testing/selftests/kvm/lib/kvm_util.c|  36 ++---
>  .../selftests/kvm/lib/perf_test_util.c| 139 ++
>  .../kvm/memslot_modification_stress_test.c|  16 +-
>  7 files changed, 145 insertions(+), 141 deletions(-)
> 
> -- 
> 2.30.0.478.g8a0d178c01-goog
>

For the series

Reviewed-by: Andrew Jones 

Thanks,
drew



Re: [PATCH 2/2] KVM: selftests: add a memslot-related performance benchmark

2021-02-08 Thread Andrew Jones
On Mon, Feb 01, 2021 at 09:10:57AM +0100, Maciej S. Szmigiero wrote:
[...]
> diff --git a/tools/testing/selftests/kvm/.gitignore 
> b/tools/testing/selftests/kvm/.gitignore
> index ce8f4ad39684..059a655053ca 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -31,3 +31,4 @@
>  /kvm_create_max_vcpus
>  /set_memory_region_test
>  /steal_time
> +/memslot_perf_test
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index e7c6237d7383..2abc9e182c30 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -65,6 +65,7 @@ TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
>  TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
>  TEST_GEN_PROGS_x86_64 += set_memory_region_test
>  TEST_GEN_PROGS_x86_64 += steal_time
> +TEST_GEN_PROGS_x86_64 += memslot_perf_test
> 
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve

We've been trying keep the lists in .gitignore and Makefile in alphabetic
order. It's not really important, but seems like we should keep it now
that we've got it. Well, except I see dirty_log_perf_test and
dirty_log_test are out of order already...

[...]
> +static bool prepare_vm(struct vm_data *data, int nslots, uint64_t *maxslots,
> +void *guest_code, uint64_t mempages,
> +struct timespec *slot_runtime)
> +{
> + uint32_t max_mem_slots;
> + uint64_t rempages;
> + uint64_t guest_addr;
> + uint32_t slot;
> + struct timespec tstart;
> + struct sync_area *sync;
> +
> + max_mem_slots = kvm_check_cap(KVM_CAP_NR_MEMSLOTS);
> + TEST_ASSERT(max_mem_slots > 1,
> + "KVM_CAP_NR_MEMSLOTS should be greater than 1");
> + TEST_ASSERT(nslots > 1 || nslots == -1,
> + "Slot count cap should be greater than 1");
> + if (nslots != -1)
> + max_mem_slots = min(max_mem_slots, (uint32_t)nslots);
> + pr_info_v("Allowed number of memory slots: %"PRIu32"\n", max_mem_slots);
> +
> + TEST_ASSERT(mempages > 1,
> + "Can't test without any memory");
> +
> + data->npages = mempages;
> + data->nslots = max_mem_slots - 1;
> + data->pages_per_slot = mempages / data->nslots;
> + if (!data->pages_per_slot) {
> + *maxslots = mempages + 1;
> + return false;
> + }
> +
> + rempages = mempages % data->nslots;
> + data->hva_slots = malloc(sizeof(*data->hva_slots) * data->nslots);
> + TEST_ASSERT(data->hva_slots, "malloc() fail");
> +
> + data->vm = vm_create_default(VCPU_ID, mempages, guest_code);
> +
> + vcpu_set_cpuid(data->vm, VCPU_ID, kvm_get_supported_cpuid());

This vcpu_set_cpuid() call, which causes problems for non-x86 builds,
is now embedded in vm_create_default() and therefore redundant here.


Otherwise this looks good to me. I'll try to find some time to test
it on an AArch64 machine configured to use 4k pages on the host.

Reviewed-by: Andrew Jones 

Thanks,
drew



Re: [PATCH 1/2] KVM: selftests: Keep track of memslots more efficiently

2021-02-08 Thread Andrew Jones
On Mon, Feb 08, 2021 at 11:16:41AM +0100, Andrew Jones wrote:
> > diff --git a/tools/testing/selftests/kvm/lib/rbtree.c 
> > b/tools/testing/selftests/kvm/lib/rbtree.c
> > new file mode 100644
> > index ..a703f0194ea3
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/lib/rbtree.c
> > @@ -0,0 +1 @@
> > +#include "../../../../lib/rbtree.c"
> >
> 
> We shouldn't dip into kernel code like this. We can use tools/lib/rbtree.c
> though.
> 
> Besides the rbtree.c thing,

Oops, sorry, just realized the first '..' applies to kvm's lib subdir.
So this is already using tools/lib/rbtree.c

Thanks,
drew

> 
> Reviewed-by: Andrew Jones 



Re: [PATCH 1/2] KVM: selftests: Keep track of memslots more efficiently

2021-02-08 Thread Andrew Jones
-8,6 +8,9 @@
>  #ifndef SELFTEST_KVM_UTIL_INTERNAL_H
>  #define SELFTEST_KVM_UTIL_INTERNAL_H
>  
> +#include "linux/hashtable.h"
> +#include "linux/rbtree.h"
> +
>  #include "sparsebit.h"
>  
>  #define KVM_DEV_PATH "/dev/kvm"
> @@ -20,7 +23,9 @@ struct userspace_mem_region {
>   void *host_mem;
>   void *mmap_start;
>   size_t mmap_size;
> - struct list_head list;
> + struct rb_node gpa_node;
> + struct rb_node hva_node;
> + struct hlist_node slot_node;
>  };
>  
>  struct vcpu {
> @@ -33,6 +38,12 @@ struct vcpu {
>   uint32_t dirty_gfns_count;
>  };
>  
> +struct userspace_mem_regions {
> + struct rb_root gpa_tree;
> + struct rb_root hva_tree;
> + DECLARE_HASHTABLE(slot_hash, 9);
> +};
> +
>  struct kvm_vm {
>   int mode;
>   unsigned long type;
> @@ -45,7 +56,7 @@ struct kvm_vm {
>   unsigned int va_bits;
>   uint64_t max_gfn;
>   struct list_head vcpus;
> - struct list_head userspace_mem_regions;
> + struct userspace_mem_regions regions;
>   struct sparsebit *vpages_valid;
>   struct sparsebit *vpages_mapped;
>   bool has_irqchip;
> diff --git a/tools/testing/selftests/kvm/lib/rbtree.c 
> b/tools/testing/selftests/kvm/lib/rbtree.c
> new file mode 100644
> index ..a703f0194ea3
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/lib/rbtree.c
> @@ -0,0 +1 @@
> +#include "../../../../lib/rbtree.c"
>

We shouldn't dip into kernel code like this. We can use tools/lib/rbtree.c
though.

Besides the rbtree.c thing,

Reviewed-by: Andrew Jones 



Re: [PATCH v5 0/2] MTE support for KVM guest

2020-11-19 Thread Andrew Jones
On Thu, Nov 19, 2020 at 03:45:40PM +, Peter Maydell wrote:
> On Thu, 19 Nov 2020 at 15:39, Steven Price  wrote:
> > This series adds support for Arm's Memory Tagging Extension (MTE) to
> > KVM, allowing KVM guests to make use of it. This builds on the existing
> > user space support already in v5.10-rc1, see [1] for an overview.
> 
> > The change to require the VMM to map all guest memory PROT_MTE is
> > significant as it means that the VMM has to deal with the MTE tags even
> > if it doesn't care about them (e.g. for virtual devices or if the VMM
> > doesn't support migration). Also unfortunately because the VMM can
> > change the memory layout at any time the check for PROT_MTE/VM_MTE has
> > to be done very late (at the point of faulting pages into stage 2).
> 
> I'm a bit dubious about requring the VMM to map the guest memory
> PROT_MTE unless somebody's done at least a sketch of the design
> for how this would work on the QEMU side. Currently QEMU just
> assumes the guest memory is guest memory and it can access it
> without special precautions...
>

There are two statements being made here:

1) Requiring the use of PROT_MTE when mapping guest memory may not fit
   QEMU well.

2) New KVM features should be accompanied with supporting QEMU code in
   order to prove that the APIs make sense.

I strongly agree with (2). While kvmtool supports some quick testing, it
doesn't support migration. We must test all new features with a migration
supporting VMM.

I'm not sure about (1). I don't feel like it should be a major problem,
but (2).

I'd be happy to help with the QEMU prototype, but preferably when there's
hardware available. Has all the current MTE testing just been done on
simulators? And, if so, are there regression tests regularly running on
the simulators too? And can they test migration? If hardware doesn't
show up quickly and simulators aren't used for regression tests, then
all this code will start rotting from day one.

Thanks,
drew



Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-11-18 Thread Andrew Jones
On Wed, Nov 18, 2020 at 04:50:01PM +, Catalin Marinas wrote:
> On Wed, Nov 18, 2020 at 04:01:20PM +, Steven Price wrote:
> > On 17/11/2020 16:07, Catalin Marinas wrote:
> > > On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote:
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index 19aacc7d64de..38fe25310ca1 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> > > > phys_addr_t fault_ipa,
> > > > if (vma_pagesize == PAGE_SIZE && !force_pte)
> > > > vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> > > >, 
> > > > _ipa);
> > > > +
> > > > +   /*
> > > > +* The otherwise redundant test for system_supports_mte() 
> > > > allows the
> > > > +* code to be compiled out when CONFIG_ARM64_MTE is not present.
> > > > +*/
> > > > +   if (system_supports_mte() && kvm->arch.mte_enabled && 
> > > > pfn_valid(pfn)) {
> > > > +   /*
> > > > +* VM will be able to see the page's tags, so we must 
> > > > ensure
> > > > +* they have been initialised.
> > > > +*/
> > > > +   struct page *page = pfn_to_page(pfn);
> > > > +   long i, nr_pages = compound_nr(page);
> > > > +
> > > > +   /* if PG_mte_tagged is set, tags have already been 
> > > > initialised */
> > > > +   for (i = 0; i < nr_pages; i++, page++) {
> > > > +   if (!test_and_set_bit(PG_mte_tagged, 
> > > > >flags))
> > > > +   mte_clear_page_tags(page_address(page));
> > > > +   }
> > > > +   }
> > > 
> > > If this page was swapped out and mapped back in, where does the
> > > restoring from swap happen?
> > 
> > Restoring from swap happens above this in the call to gfn_to_pfn_prot()
> 
> Looking at the call chain, gfn_to_pfn_prot() ends up with
> get_user_pages() using the current->mm (the VMM) and that does a
> set_pte_at(), presumably restoring the tags. Does this mean that all
> memory mapped by the VMM in user space should have PROT_MTE set?
> Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no
> tags restored from swap (we do save them since when they were mapped,
> PG_mte_tagged was set).
> 
> So I think the code above should be similar to mte_sync_tags(), even
> calling a common function, but I'm not sure where to get the swap pte
> from.
> 
> An alternative is to only enable HCR_EL2.ATA and MTE in guest if the vmm
> mapped the memory with PROT_MTE.

This is a very reasonable alternative. The VMM must be aware of whether
the guest may use MTE anyway. Asking it to map the memory with PROT_MTE
when it wants to offer the guest that option is a reasonable requirement.
If the memory is not mapped as such, then the host kernel shouldn't assume
MTE may be used by the guest, and it should even enforce that it is not
(by not enabling the feature).

Thanks,
drew

> 
> Yet another option is to always call mte_sync_tags() from set_pte_at()
> and defer the pte_tagged() or is_swap_pte() checks to the MTE code.
> 
> -- 
> Catalin
> 



Re: [PATCH v2 0/5] Add a dirty logging performance test

2020-11-11 Thread Andrew Jones
On Tue, Nov 03, 2020 at 03:49:47PM -0800, Ben Gardon wrote:
> Currently KVM lacks a simple, userspace agnostic, performance benchmark for
> dirty logging. Such a benchmark will be beneficial for ensuring that dirty
> logging performance does not regress, and to give a common baseline for
> validating performance improvements. The dirty log perf test introduced in
> this series builds on aspects of the existing demand paging perf test and
> provides time-based performance metrics for enabling and disabling dirty
> logging, getting the dirty log, and dirtying memory.
> 
> While the test currently only has a build target for x86, I expect it will
> work on, or be easily modified to support other architectures.
> 
> This series was tested by running the following invocations on an Intel
> Skylake machine after apply all commits in the series:
> dirty_log_perf_test -b 20m -i 100 -v 64
> dirty_log_perf_test -b 20g -i 5 -v 4
> dirty_log_perf_test -b 4g -i 5 -v 32
> demand_paging_test -b 20m -v 64
> demand_paging_test -b 20g -v 4
> demand_paging_test -b 4g -v 32
> All behaved as expected.
> 
> v1 -> v2 changes:

Considering v1 got applied,

> (in response to comments from Peter Xu)
> - Removed pr_debugs from main test thread while waiting on vCPUs to reduce
>   log spam

I didn't look at this. Maybe you and Peter can decide if a pr_debug
cleanup patch needs to be sent.

> - Fixed a bug in iteration counting that caused the population stage to be
>   counted as part of the first dirty logging pass
> - Fixed a bug in which the test failed to wait for the population stage for 
> all
>   but the first vCPU.

I didn't try to confirm that these fixes were in. I do see that Paolo
added 6d6a18fdde8b ("KVM: selftests: allow two iterations of
dirty_log_perf_test"), which sounds like it might be fixing the same thing
as your first "Fixed" bullet above. Anyway, you may want to check the 
current code to see if any additional fixes should be sent.

> - Refactored the common code in perf_test_util.c/h

I did this part in my "Cleanups, take 2" series

> - Moved testing description to cover letter
> - Renamed timespec_diff_now to timespec_elapsed

These two would have been nice, but oh well...

Thanks,
drew



Re: [PATCH 0/5] Add a dirty logging performance test

2020-11-09 Thread Andrew Jones
On Fri, Nov 06, 2020 at 01:48:29PM +0100, Paolo Bonzini wrote:
> On 28/10/20 00:37, Ben Gardon wrote:
> > Currently KVM lacks a simple, userspace agnostic, performance benchmark for
> > dirty logging. Such a benchmark will be beneficial for ensuring that dirty
> > logging performance does not regress, and to give a common baseline for
> > validating performance improvements. The dirty log perf test introduced in
> > this series builds on aspects of the existing demand paging perf test and
> > provides time-based performance metrics for enabling and disabling dirty
> > logging, getting the dirty log, and dirtying memory.
> > 
> > While the test currently only has a build target for x86, I expect it will
> > work on, or be easily modified to support other architectures.
> > 
> > Ben Gardon (5):
> >KVM: selftests: Factor code out of demand_paging_test
> >KVM: selftests: Remove address rounding in guest code
> >KVM: selftests: Simplify demand_paging_test with timespec_diff_now
> >KVM: selftests: Add wrfract to common guest code
> >KVM: selftests: Introduce the dirty log perf test
> > 
> >   tools/testing/selftests/kvm/.gitignore|   1 +
> >   tools/testing/selftests/kvm/Makefile  |   1 +
> >   .../selftests/kvm/demand_paging_test.c| 230 ++-
> >   .../selftests/kvm/dirty_log_perf_test.c   | 382 ++
> >   .../selftests/kvm/include/perf_test_util.h| 192 +
> >   .../testing/selftests/kvm/include/test_util.h |   2 +
> >   tools/testing/selftests/kvm/lib/test_util.c   |  22 +-
> >   7 files changed, 635 insertions(+), 195 deletions(-)
> >   create mode 100644 tools/testing/selftests/kvm/dirty_log_perf_test.c
> >   create mode 100644 tools/testing/selftests/kvm/include/perf_test_util.h
> > 
> 
> Queued, thanks.

Why would you do that? Peter reviewed this, making several comments,
such as not to put non-inline functions in header files. Ben took the
time to respin the series, posting a v2. It makes no sense to pick up
v1 after they put in that additional effort.

drew



Re: [PATCH v2 2/5] KVM: selftests: Factor code out of demand_paging_test

2020-11-04 Thread Andrew Jones
On Wed, Nov 04, 2020 at 10:00:17AM -0500, Peter Xu wrote:
> On Wed, Nov 04, 2020 at 01:16:31PM +0100, Andrew Jones wrote:
> > If you don't mind I'd like to try and cleanup / generalize / refactor
> > demand_paging_test.c and dirty_log_test.c with a few patches first for
> > you to base this work on. I can probably get something posted today
> > or tomorrow.
> 
> Drew,
> 
> Would you consider picking up the two patches below in the dirty ring series 
> if
> you plan to rework the dirty log tests?  I got your r-b so I am making bold to
> think I'm ok to ask this; I just want to avoid another potential conflict
> within the series.

Sure, no problem.

I'll go ahead and get that cleanup / refactor series out.

Thanks,
drew

> 
> Thanks!
> 
> [1] https://lore.kernel.org/kvm/20201023183358.50607-11-pet...@redhat.com/
> [2] https://lore.kernel.org/kvm/20201023183358.50607-12-pet...@redhat.com/
> 
> -- 
> Peter Xu
> 



Re: [PATCH v2 2/5] KVM: selftests: Factor code out of demand_paging_test

2020-11-04 Thread Andrew Jones
On Tue, Nov 03, 2020 at 03:49:49PM -0800, Ben Gardon wrote:
> Much of the code in demand_paging_test can be reused by other, similar
> multi-vCPU-memory-touching-perfromance-tests. Factor that common code

performance

> out for reuse.
> 
> No functional change expected.
> 
> Signed-off-by: Ben Gardon 
> ---
>  tools/testing/selftests/kvm/Makefile  |   6 +-
>  .../selftests/kvm/demand_paging_test.c| 202 ++
>  .../selftests/kvm/include/perf_test_util.h|  50 +
>  .../selftests/kvm/lib/perf_test_util.c| 161 ++
>  4 files changed, 235 insertions(+), 184 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/include/perf_test_util.h
>  create mode 100644 tools/testing/selftests/kvm/lib/perf_test_util.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index 30afbad36cd55..9b2bebb64175b 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -33,8 +33,10 @@ ifeq ($(ARCH),s390)
>   UNAME_M := s390x
>  endif
>  
> -LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c 
> lib/test_util.c
> -LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c 
> lib/x86_64/ucall.c
> +LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c \
> +  lib/test_util.c lib/perf_test_util.c
> +LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c \
> + lib/x86_64/ucall.c
>  LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
>  LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c
>  
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c 
> b/tools/testing/selftests/kvm/demand_paging_test.c
> index 32a42eafc6b5c..682805dd8c2ac 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -21,18 +21,12 @@
>  #include 
>  #include 
>  
> -#include "test_util.h"
> -#include "kvm_util.h"
> +#include "perf_test_util.h"
>  #include "processor.h"
> +#include "test_util.h"

I don't think we moved out everything used from kvm_util.h, so we
shouldn't remove that include.

Is there a reason to include test_util.h last?

>  
>  #ifdef __NR_userfaultfd
>  
> -/* The memory slot index demand page */
> -#define TEST_MEM_SLOT_INDEX  1
> -
> -/* Default guest test virtual memory offset */
> -#define DEFAULT_GUEST_TEST_MEM   0xc000
> -
>  #define DEFAULT_GUEST_TEST_MEM_SIZE (1 << 30) /* 1G */
>  
>  #ifdef PRINT_PER_PAGE_UPDATES
> @@ -47,74 +41,14 @@
>  #define PER_VCPU_DEBUG(...) _no_printf(__VA_ARGS__)
>  #endif
>  
> -#define MAX_VCPUS 512
> -
> -/*
> - * Guest/Host shared variables. Ensure addr_gva2hva() and/or
> - * sync_global_to/from_guest() are used when accessing from
> - * the host. READ/WRITE_ONCE() should also be used with anything
> - * that may change.
> - */
> -static uint64_t host_page_size;
> -static uint64_t guest_page_size;
> -
>  static char *guest_data_prototype;
>  
> -/*
> - * Guest physical memory offset of the testing memory slot.
> - * This will be set to the topmost valid physical address minus
> - * the test memory size.
> - */
> -static uint64_t guest_test_phys_mem;
> -
> -/*
> - * Guest virtual memory offset of the testing memory slot.
> - * Must not conflict with identity mapped test code.
> - */
> -static uint64_t guest_test_virt_mem = DEFAULT_GUEST_TEST_MEM;
> -
> -struct vcpu_args {
> - uint64_t gva;
> - uint64_t pages;
> -
> - /* Only used by the host userspace part of the vCPU thread */
> - int vcpu_id;
> - struct kvm_vm *vm;
> -};
> -
> -static struct vcpu_args vcpu_args[MAX_VCPUS];
> -
> -/*
> - * Continuously write to the first 8 bytes of each page in the demand paging
> - * memory region.
> - */
> -static void guest_code(uint32_t vcpu_id)
> -{
> - uint64_t gva;
> - uint64_t pages;
> - int i;
> -
> - /* Make sure vCPU args data structure is not corrupt. */
> - GUEST_ASSERT(vcpu_args[vcpu_id].vcpu_id == vcpu_id);
> -
> - gva = vcpu_args[vcpu_id].gva;
> - pages = vcpu_args[vcpu_id].pages;
> -
> - for (i = 0; i < pages; i++) {
> - uint64_t addr = gva + (i * guest_page_size);
> -
> - *(uint64_t *)addr = 0x0123456789ABCDEF;
> - }
> -
> - GUEST_SYNC(1);
> -}
> -
>  static void *vcpu_worker(void *data)
>  {
>   int ret;
> - struct vcpu_args *args = (struct vcpu_args *)data;
> - struct kvm_vm *vm = args->vm;
> - int vcpu_id = args->vcpu_id;
> + struct vcpu_args *vcpu_args = (struct vcpu_args *)data;
> + int vcpu_id = vcpu_args->vcpu_id;
> + struct kvm_vm *vm = perf_test_args.vm;
>   struct kvm_run *run;
>   struct timespec start, end, ts_diff;
>  
> @@ -140,39 +74,6 @@ static void *vcpu_worker(void *data)
>   return NULL;
>  }
>  
> -#define PAGE_SHIFT_4K  12
> -#define PTES_PER_4K_PT 512
> -
> -static struct kvm_vm *create_vm(enum vm_guest_mode 

Re: [PATCH v2 1/5] KVM: selftests: Remove address rounding in guest code

2020-11-04 Thread Andrew Jones
On Tue, Nov 03, 2020 at 03:49:48PM -0800, Ben Gardon wrote:
> Rounding the address the guest writes to a host page boundary
> will only have an effect if the host page size is larger than the guest
> page size, but in that case the guest write would still go to the same
> host page. There's no reason to round the address down, so remove the
> rounding to simplify the demand paging test.
> 
> Signed-off-by: Ben Gardon 
> ---
>  tools/testing/selftests/kvm/demand_paging_test.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c 
> b/tools/testing/selftests/kvm/demand_paging_test.c
> index 360cd3ea4cd67..32a42eafc6b5c 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -103,7 +103,6 @@ static void guest_code(uint32_t vcpu_id)
>   for (i = 0; i < pages; i++) {
>   uint64_t addr = gva + (i * guest_page_size);
>  
> - addr &= ~(host_page_size - 1);
>   *(uint64_t *)addr = 0x0123456789ABCDEF;
>   }
>  
> -- 
> 2.29.1.341.ge80a0c044ae-goog
>

Reviewed-by: Andrew Jones 



Re: [PATCH 1/2] KVM: selftests: Add get featured msrs test case

2020-10-26 Thread Andrew Jones
On Mon, Oct 26, 2020 at 09:58:54AM +0100, Vitaly Kuznetsov wrote:
> Peter Xu  writes:
> > +int kvm_vm_get_feature_msrs(struct kvm_vm *vm, struct kvm_msrs *msrs)
> > +{
> > +   return ioctl(vm->kvm_fd, KVM_GET_MSRS, msrs);
> > +}
> 
> I *think* that the non-written rule for kvm selftests is that functions
> without '_' prefix check ioctl return value with TEST_ASSERT() and
> functions with it don't (e.g. _vcpu_run()/vcpu_run()) but maybe it's
> just me.
>

Yes, that's the pattern I've been trying to implement. If we want to be
strict about it, then we should do a quick scan of the code to ensure
its currently consistent. I have it feeling it isn't.

Thanks,
drew



Re: [PATCH v3 0/2] MTE support for KVM guest

2020-10-02 Thread Andrew Jones
On Fri, Oct 02, 2020 at 04:38:11PM +0100, Steven Price wrote:
> On 02/10/2020 15:36, Andrew Jones wrote:
> > On Fri, Sep 25, 2020 at 10:36:05AM +0100, Steven Price wrote:
> > > Version 3 of adding MTE support for KVM guests. See the previous (v2)
> > > posting for background:
> > > 
> > >   https://lore.kernel.org/r/20200904160018.29481-1-steven.price%40arm.com
> > > 
> > > These patches add support to KVM to enable MTE within a guest. They are
> > > based on Catalin's v9 MTE user-space support series[1] (currently in
> > > next).
> > > 
> > > Changes since v2:
> > > 
> > >   * MTE is no longer a VCPU feature, instead it is a VM cap.
> > > 
> > >   * Being a VM cap means easier probing (check for KVM_CAP_ARM_MTE).
> > > 
> > >   * The cap must be set before any VCPUs are created, preventing any
> > > shenanigans where MTE is enabled for the guest after memory accesses
> > > have been performed.
> > > 
> > > [1] 
> > > https://lore.kernel.org/r/20200904103029.32083-1-catalin.mari...@arm.com
> > > 
> > > Steven Price (2):
> > >arm64: kvm: Save/restore MTE registers
> > >arm64: kvm: Introduce MTE VCPU feature
> > > 
> > >   arch/arm64/include/asm/kvm_emulate.h   |  3 +++
> > >   arch/arm64/include/asm/kvm_host.h  |  7 +++
> > >   arch/arm64/include/asm/sysreg.h|  3 ++-
> > >   arch/arm64/kvm/arm.c   |  9 +
> > >   arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++
> > >   arch/arm64/kvm/mmu.c   | 15 +++
> > >   arch/arm64/kvm/sys_regs.c  | 20 +++-
> > >   include/uapi/linux/kvm.h   |  1 +
> > >   8 files changed, 66 insertions(+), 6 deletions(-)
> > > 
> > > -- 
> > > 2.20.1
> > > 
> > > 
> > 
> > Hi Steven,
> > 
> > These patches look fine to me, but I'd prefer we have a working
> > implementation in QEMU before we get too excited about the KVM
> > bits. kvmtool isn't sufficient since it doesn't support migration
> > (at least afaik). In the past we've implemented features in KVM
> > that look fine, but then issues have been discovered when trying
> > to enable them from QEMU, where we also support migration. This
> > feature looks like there's risk of issues with the userspace side.
> > Although these two patches would probably stay the same, even if
> > userspace requires more support.
> 
> I agree kvmtool isn't a great test because it doesn't support migration. The
> support in this series is just the basic support for MTE in a guest and we'd
> need to wait for the QEMU implementation before deciding whether we need any
> extra support (e.g. kernel interfaces for reading/writing tags as discussed
> before).
> 
> However, I don't think there's much danger of the support in this series
> changing - so extra support can be added when/if it's needed, but I don't
> think we need to block these series on that - QEMU can just probe for
> whatever additional support it needs before enabling MTE in a guest. I plan
> to rebase/repost after -rc1 when the user space support has been merged.
> 

Fair enough, but it feels like we'll be merging half a feature, leaving
the other half for somebody else to pick up later.

Thanks,
drew



Re: [PATCH v3 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-10-02 Thread Andrew Jones
On Fri, Oct 02, 2020 at 04:30:47PM +0100, Steven Price wrote:
> On 02/10/2020 15:30, Andrew Jones wrote:
> > On Fri, Sep 25, 2020 at 10:36:07AM +0100, Steven Price wrote:
> > > + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
> > 
> > 'system_supports_mte() && kvm->arch.mte_enabled' is redundant, but I
> > assume system_supports_mte() is there to improve the efficiency of the
> > branch, as it's using cpus_have_const_cap().
> 
> system_supports_mte() compiles to 0 when MTE support isn't built in, so this
> code can be removed by the compiler,

I know. That's what I meant by "improve the efficiency of the branch"


> whereas with kvm->arch.mte_enabled I
> doubt the compiler can deduce that it is never set.
> 
> > Maybe a helper like
> > 
> >   static inline bool kvm_arm_mte_enabled(struct kvm *kvm)
> >   {
> > return system_supports_mte() && kvm->arch.mte_enabled;
> >   }
> > 
> > would allow both the more efficient branch and look less confusing
> > where it gets used.
> 
> I wasn't sure it was worth having a helper since this was the only place
> checking this condition. It's also a bit tricky putting this in a logical
> header file, kvm_host.h doesn't work because struct kvm hasn't been defined
> by then.

OK, but I feel like we're setting ourselves up to revisit these types of
conditions again when our memories fade or when new developers see them
for the first time and ask.

Thanks,
drew

> 
> Steve
> 
> > > + /*
> > > +  * VM will be able to see the page's tags, so we must ensure
> > > +  * they have been initialised.
> > > +  */
> > > + struct page *page = pfn_to_page(pfn);
> > > + long i, nr_pages = compound_nr(page);
> > > +
> > > + /* if PG_mte_tagged is set, tags have already been initialised 
> > > */
> > > + for (i = 0; i < nr_pages; i++, page++) {
> > > + if (!test_and_set_bit(PG_mte_tagged, >flags))
> > > + mte_clear_page_tags(page_address(page));
> > > + }
> > > + }
> > > +
> > >   if (writable)
> > >   kvm_set_pfn_dirty(pfn);
> > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > index a655f172b5ad..5010a47152b4 100644
> > > --- a/arch/arm64/kvm/sys_regs.c
> > > +++ b/arch/arm64/kvm/sys_regs.c
> > > @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
> > >   val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
> > >   val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
> > >   } else if (id == SYS_ID_AA64PFR1_EL1) {
> > > - val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> > > + if (!vcpu->kvm->arch.mte_enabled)
> > > + val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> > >   } else if (id == SYS_ID_AA64ISAR1_EL1 && 
> > > !vcpu_has_ptrauth(vcpu)) {
> > >   val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
> > >(0xfUL << ID_AA64ISAR1_API_SHIFT) |
> > > @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, 
> > > struct sys_reg_params *p,
> > >   static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
> > >  const struct sys_reg_desc *rd)
> > >   {
> > > + if (vcpu->kvm->arch.mte_enabled)
> > > + return 0;
> > > +
> > >   return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
> > >   }
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index f6d86033c4fa..87678ed82ab4 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -1035,6 +1035,7 @@ struct kvm_ppc_resize_hpt {
> > >   #define KVM_CAP_LAST_CPU 184
> > >   #define KVM_CAP_SMALLER_MAXPHYADDR 185
> > >   #define KVM_CAP_S390_DIAG318 186
> > > +#define KVM_CAP_ARM_MTE 188
> > >   #ifdef KVM_CAP_IRQ_ROUTING
> > > -- 
> > > 2.20.1
> > > 
> > > 
> > 
> > Besides the helper suggestion nit
> > 
> > Reviewed-by: Andrew Jones 
> > 
> 
> 



Re: [PATCH v3 0/2] MTE support for KVM guest

2020-10-02 Thread Andrew Jones
On Fri, Sep 25, 2020 at 10:36:05AM +0100, Steven Price wrote:
> Version 3 of adding MTE support for KVM guests. See the previous (v2)
> posting for background:
> 
>  https://lore.kernel.org/r/20200904160018.29481-1-steven.price%40arm.com
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1] (currently in
> next).
> 
> Changes since v2:
> 
>  * MTE is no longer a VCPU feature, instead it is a VM cap.
> 
>  * Being a VM cap means easier probing (check for KVM_CAP_ARM_MTE).
> 
>  * The cap must be set before any VCPUs are created, preventing any
>shenanigans where MTE is enabled for the guest after memory accesses
>have been performed.
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.mari...@arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h   |  3 +++
>  arch/arm64/include/asm/kvm_host.h  |  7 +++
>  arch/arm64/include/asm/sysreg.h|  3 ++-
>  arch/arm64/kvm/arm.c   |  9 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++
>  arch/arm64/kvm/mmu.c   | 15 +++
>  arch/arm64/kvm/sys_regs.c  | 20 +++-
>  include/uapi/linux/kvm.h   |  1 +
>  8 files changed, 66 insertions(+), 6 deletions(-)
> 
> -- 
> 2.20.1
> 
>

Hi Steven,

These patches look fine to me, but I'd prefer we have a working
implementation in QEMU before we get too excited about the KVM
bits. kvmtool isn't sufficient since it doesn't support migration
(at least afaik). In the past we've implemented features in KVM
that look fine, but then issues have been discovered when trying
to enable them from QEMU, where we also support migration. This
feature looks like there's risk of issues with the userspace side.
Although these two patches would probably stay the same, even if
userspace requires more support.

Thanks,
drew



Re: [PATCH v3 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-10-02 Thread Andrew Jones
arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>   val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>   val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>   } else if (id == SYS_ID_AA64PFR1_EL1) {
> - val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> + if (!vcpu->kvm->arch.mte_enabled)
> + val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>   } else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>   val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>(0xfUL << ID_AA64ISAR1_API_SHIFT) |
> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, 
> struct sys_reg_params *p,
>  static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>  const struct sys_reg_desc *rd)
>  {
> + if (vcpu->kvm->arch.mte_enabled)
> + return 0;
> +
>   return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>  }
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f6d86033c4fa..87678ed82ab4 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1035,6 +1035,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_LAST_CPU 184
>  #define KVM_CAP_SMALLER_MAXPHYADDR 185
>  #define KVM_CAP_S390_DIAG318 186
> +#define KVM_CAP_ARM_MTE 188
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.20.1
> 
>

Besides the helper suggestion nit

Reviewed-by: Andrew Jones 



Re: [PATCH v3 1/2] arm64: kvm: Save/restore MTE registers

2020-10-02 Thread Andrew Jones
_reg(ctxt, TPIDR_EL1), tpidr_el1);
> + if (system_supports_mte())
> + write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
>  
>   if (!has_vhe() &&
>   cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 379f4969d0bd..a655f172b5ad 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, 
> struct sys_reg_params *p,
>   return false;
>  }
>  
> +static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
> +const struct sys_reg_desc *rd)
> +{
> + return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
> +}
> +
>  /* sys_reg_desc initialiser for known cpufeature ID registers */
>  #define ID_SANITISED(name) { \
>   SYS_DESC(SYS_##name),   \
> @@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>   { SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
>   { SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
>  
> - { SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
> - { SYS_DESC(SYS_GCR_EL1), access_mte_regs },
> + { SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1, 
> .visibility = mte_visibility },
> + { SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1, 
> .visibility = mte_visibility },
>  
>   { SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = 
> sve_visibility },
>   { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
> @@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>   { SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
>   { SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
>  
> - { SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
> - { SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
> + { SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1, 
> .visibility = mte_visibility },
> + { SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown, TFSRE0_EL1, 
> .visibility = mte_visibility },
>  
>   { SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
>   { SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
> -- 
> 2.20.1
> 
>

Reviewed-by: Andrew Jones 



Re: [PATCH v2 0/2] MTE support for KVM guest

2020-09-10 Thread Andrew Jones
On Thu, Sep 10, 2020 at 02:27:48PM +0100, Dr. David Alan Gilbert wrote:
> * Andrew Jones (drjo...@redhat.com) wrote:
> > On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > > >>This might also have benefits in terms of providing an easy way to
> > > >>read bulk tag data from guest memory (since the LDGM instruction
> > > >>isn't available at EL0).
> > > > 
> > > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > > the tags for all addresses of each dirty page.
> > > 
> > > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM 
> > > copies
> > > the data out from its local address to guest memory.
> > > 
> > > There'd be no difference with or without tags, afaik.  It's just about 
> > > how VMM
> > > copies the data, with or without tags.
> > 
> > Right, as long as it's fast enough to do
> > 
> >   for_each_dirty_page(page, dirty_log)
> > for (i = 0; i < host-page-size/16; i += 16)
> >   append_tag(LDG(page + i))
> > 
> > to get all the tags for each dirty page. I understood it would be faster
> > to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> > just piggyback on a new dirty-log ioctl instead.
> 
> That feels a bad idea to me; there's a couple of different ways dirty
> page checking work; lets keep extracting the tags separate.
>

It's sounding like it was a premature optimization anyway. We don't yet
know if an ioctl for LDGM is worth it. Looping over LDG may work fine.

Thanks,
drew 



Re: [PATCH v2 0/2] MTE support for KVM guest

2020-09-10 Thread Andrew Jones
On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
> On 10/09/2020 07:29, Andrew Jones wrote:
> > But if userspace created the memslots with memory already set with
> > PROT_MTE, then this wouldn't be necessary, right? And, as long as
> > there's still a way to access the memory with tag checking disabled,
> > then it shouldn't be a problem.
> 
> Yes, so one option would be to attempt to validate that the VMM has provided
> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
> the VMM can change the memory backing at any time - so we could end up in
> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
> that point there's no nice way of handling it (other than silently upgrading
> the page) so the VM is dead.
> 
> So since enforcing that PG_mte_tagged is set isn't easy and provides a
> hard-to-debug foot gun to the VMM I decided the better option was to let the
> kernel set the bit automatically.
>

The foot gun still exists when migration is considered, no? If userspace
is telling a guest it can use MTE on its normal memory, but then doesn't
prepare that memory correctly, or remember to migrate the tags correctly
(which requires knowing the memory has tags and knowing how to get them),
then I guess the VM is in trouble one way or another.

I feel like we should trust the VMM to ensure MTE will work on any memory
the guest could use it on, and change the action in user_mem_abort() to
abort the guest with a big error message if it sees the flag is missing.
 
> > > > 
> > > > If userspace needs to write to guest memory then it should be due to
> > > > a device DMA or other specific hardware emulation. Those accesses can
> > > > be done with tag checking disabled.
> > > 
> > > Yes, the question is can the VMM (sensibly) wrap the accesses with a
> > > disable/renable tag checking for the process sequence. The alternative at
> > > the moment is to maintain a separate (untagged) mapping for the purpose
> > > which might present it's own problems.
> > 
> > Hmm, so there's no easy way to disable tag checking when necessary? If we
> > don't map the guest ram with PROT_MTE and continue setting the attribute
> > in KVM, as this series does, then we don't need to worry about it tag
> > checking when accessing the memory, but then we can't access the tags for
> > migration.
> 
> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
> tag checking, so if it's reasonable to wrap accesses to the memory you can
> simply set the TCO bit, perform the memory access and then unset TCO. That
> would mean a single mapping with MTE enabled would work fine. What I don't
> have a clue about is whether it's practical in the VMM to wrap guest
> accesses like this.
> 

At least QEMU goes through many abstractions to get to memory already.
There may already be a hook we could use, if not, it probably wouldn't
be too hard to add one (famous last words).

Thanks,
drew



Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-09-10 Thread Andrew Jones
On Thu, Sep 10, 2020 at 10:21:07AM +0100, Steven Price wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't need
> the cap? Or would it still be useful?
>

We wouldn't need it, but we don't _need_ it now either. It's not very
convenient to probe vcpu features with scratch vcpus, especially if we
must probe one at a time, but it works. The TELL_ME_WHAT_YOU_HAVE idea
will only fix the one at a time issue, but still require a vcpu fd. If
this feature becomes a VM feature then a cap or VM level API would help
reduce the userspace probing work.

Thanks,
drew



Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-09-10 Thread Andrew Jones
On Thu, Sep 10, 2020 at 08:38:54AM +0200, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> > On Wed, 9 Sep 2020 at 16:48, Andrew Jones  wrote:
> > > We either need a KVM cap or a new CPU feature probing interface to avoid
> > > making userspace try features one at a time. It's too bad that VCPU_INIT
> > > doesn't clear all offending features from the feature set when returning
> > > EINVAL, because then userspace could create a scratch VCPU with everything
> > > it supports in order to see what KVM also supports in one go.
> > 
> > You could add one if you wanted -- add a new feature bit
> > TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> > it clears out feature bits it doesn't support and also clears
> > TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> > is still set, then it knows it's dealing with an old kernel
> > and has to do one-at-a-time probing. If it sees EINVAL but not
> > TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> > has just got all the info.
> >
> 
> That's a great proposal. I'll try to find time to send the patches.
>

We also have KVM_ARM_PREFERRED_TARGET, which is documented as

"""
...
The ioctl returns struct kvm_vcpu_init instance containing information
about preferred CPU target type and recommended features for it.  The
kvm_vcpu_init->features bitmap returned will have feature bits set if
the preferred target recommends setting these features, but this is
not mandatory.
...
"""

But, it says "recommended" features, not "all supported" features,
and the current implementation of KVM_ARM_PREFERRED_TARGET only
zeros out features. So, I think we should just leave
KVM_ARM_PREFERRED_TARGET as is and stick to the plan of extending
VCPU_INIT.

Thanks,
drew



Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-09-10 Thread Andrew Jones
On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> On Wed, 9 Sep 2020 at 16:48, Andrew Jones  wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> You could add one if you wanted -- add a new feature bit
> TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> it clears out feature bits it doesn't support and also clears
> TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> is still set, then it knows it's dealing with an old kernel
> and has to do one-at-a-time probing. If it sees EINVAL but not
> TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> has just got all the info.
>

That's a great proposal. I'll try to find time to send the patches.

Thanks,
drew



Re: [PATCH v2 0/2] MTE support for KVM guest

2020-09-10 Thread Andrew Jones
On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
> On 09/09/2020 16:25, Andrew Jones wrote:
> > On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> > >   2. Automatically promotes (normal host) memory given to the guest to be
> > >  tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
> > >  tags are cleared if the memory wasn't previously MTE enabled.
> > 
> > Shouldn't this be up to the guest? Or, is this required in order for the
> > guest to use tagging at all. Something like making the guest IPAs memtag
> > capable, but if the guest doesn't enable tagging then there is no guest
> > impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> > to the memory regions it wants the guest to be able to use tagging with,
> > rather than KVM adding the attribute page by page?
> 
> I think I've probably explained this badly.
> 
> The guest can choose how to populate the stage 1 mapping - so can choose
> which parts of memory are accessed tagged or not. However, the hypervisor
> cannot restrict this in stage 2 (except by e.g. making the memory uncached
> but that's obviously not great - however devices forward to the guest can be
> handled like this).
> 
> Because the hypervisor cannot restrict the guest's access to the tags, the
> hypervisor must assume that all memory given to the guest could have the
> tags accessed. So it must (a) clear any stale data from the tags, and (b)
> ensure that the tags are preserved (e.g. when swapping pages out).
> 

Yes, this is how I understood it.

> Because of the above the current series automatically sets PG_mte_tagged on
> the pages. Note that this doesn't change the mappings that the VMM has (a
> non-PROT_MTE mapping will still not have access to the tags).

But if userspace created the memslots with memory already set with
PROT_MTE, then this wouldn't be necessary, right? And, as long as
there's still a way to access the memory with tag checking disabled,
then it shouldn't be a problem.

> > 
> > If userspace needs to write to guest memory then it should be due to
> > a device DMA or other specific hardware emulation. Those accesses can
> > be done with tag checking disabled.
> 
> Yes, the question is can the VMM (sensibly) wrap the accesses with a
> disable/renable tag checking for the process sequence. The alternative at
> the moment is to maintain a separate (untagged) mapping for the purpose
> which might present it's own problems.

Hmm, so there's no easy way to disable tag checking when necessary? If we
don't map the guest ram with PROT_MTE and continue setting the attribute
in KVM, as this series does, then we don't need to worry about it tag
checking when accessing the memory, but then we can't access the tags for
migration.

> 
> > > 
> > > If it's not practical to either disable tag checking in the VMM or
> > > maintain multiple mappings then the alternatives I'm aware of are:
> > > 
> > >   * Provide a KVM-specific method to extract the tags from guest memory.
> > > This might also have benefits in terms of providing an easy way to
> > > read bulk tag data from guest memory (since the LDGM instruction
> > > isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> Certainly possible, although it seems to conflate two operations: "get list
> of dirty pages", "get tags from page". It would also require a lot of return
> space (size of slot/32).
>

It would require num-set-bits * host-page-size / 16 / 2, right?
 
> > >   * Provide support for user space setting the TCMA0 or TCMA1 bits in
> > > TCR_EL1. These would allow the VMM to generate pointers which are not
> > > tag checked.
> > 
> > So this is necessary to allow the VMM to keep tag checking enabled for
> > itself, plus map guest memory as PROT_MTE, and write to that memory when
> > needed?
> 
> This is certainly one option. The architecture provides two "magic" values
> (all-0s and all-1s) which can be configured using TCMAx to be treated
> differently. The VMM could therefore construct pointers to otherwise tagged
> memory which would be treated as untagged.
> 
> However, Catalin's user space series doesn't at the moment expose this
> functionality.
>

So if I understand correctly this would allow us to map the guest memory
with PAGE_MTE and still access the memory when needed. If so, then this
sounds interesting.

Thanks,
drew 



Re: [PATCH v2 0/2] MTE support for KVM guest

2020-09-09 Thread Andrew Jones
On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> On 9/9/20 8:25 AM, Andrew Jones wrote:
> >>  * Provide a KVM-specific method to extract the tags from guest memory.
> >>This might also have benefits in terms of providing an easy way to
> >>read bulk tag data from guest memory (since the LDGM instruction
> >>isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> the data out from its local address to guest memory.
> 
> There'd be no difference with or without tags, afaik.  It's just about how VMM
> copies the data, with or without tags.

Right, as long as it's fast enough to do

  for_each_dirty_page(page, dirty_log)
for (i = 0; i < host-page-size/16; i += 16)
  append_tag(LDG(page + i))

to get all the tags for each dirty page. I understood it would be faster
to use LDGM, but we'd need a new ioctl for that. So I was proposing we
just piggyback on a new dirty-log ioctl instead.

Thanks,
drew 



Re: [PATCH v2 0/2] MTE support for KVM guest

2020-09-09 Thread Andrew Jones
On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
>
   0. Feature probing

Probably a KVM cap, rather than requiring userspace to attempt VCPU
features one at a time with a scratch VCPU.
 
>  1. Requires the VMM to enable MTE per-VCPU.

I suppose. We're collecting many features that are enabling CPU features,
so they map nicely to VCPU features, yet they're effectively VM features
due to a shared resource such as an irq or memory.

>  2. Automatically promotes (normal host) memory given to the guest to be
> tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
> tags are cleared if the memory wasn't previously MTE enabled.

Shouldn't this be up to the guest? Or, is this required in order for the
guest to use tagging at all. Something like making the guest IPAs memtag
capable, but if the guest doesn't enable tagging then there is no guest
impact? In any case, shouldn't userspace be the one that adds PROT_MTE
to the memory regions it wants the guest to be able to use tagging with,
rather than KVM adding the attribute page by page?

>  3. Doesn't provide any new methods for the VMM to access the tags on
> memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).
> 
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

If userspace needs to write to guest memory then it should be due to
a device DMA or other specific hardware emulation. Those accesses can
be done with tag checking disabled.

> 
> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>This might also have benefits in terms of providing an easy way to
>read bulk tag data from guest memory (since the LDGM instruction
>isn't available at EL0).

Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
the tags for all addresses of each dirty page.

>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>TCR_EL1. These would allow the VMM to generate pointers which are not
>tag checked.

So this is necessary to allow the VMM to keep tag checking enabled for
itself, plus map guest memory as PROT_MTE, and write to that memory when
needed? 

Thanks,
drew

> 
> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.mari...@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h   |  3 +++
>  arch/arm64/include/asm/kvm_host.h  |  9 -
>  arch/arm64/include/asm/sysreg.h|  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h  |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++
>  arch/arm64/kvm/mmu.c   | 15 +++
>  arch/arm64/kvm/reset.c |  8 
>  arch/arm64/kvm/sys_regs.c  | 20 

Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-09-09 Thread Andrew Jones
On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
> on a VCPU. When enabled on any VCPU in the virtual machine this causes
> all pages that are faulted into the VM to have the PG_mte_tagged flag
> set (and the tag storage cleared if this is the first use).
> 
> Signed-off-by: Steven Price 
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>  arch/arm64/include/asm/kvm_host.h|  5 -
>  arch/arm64/include/uapi/asm/kvm.h|  1 +
>  arch/arm64/kvm/mmu.c | 15 +++
>  arch/arm64/kvm/reset.c   |  8 
>  arch/arm64/kvm/sys_regs.c|  6 +-
>  6 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 49a55be2b9a2..0042323a4b7f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>   if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>   vcpu_el1_is_32bit(vcpu))
>   vcpu->arch.hcr_el2 |= HCR_TID2;
> +
> + if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> + vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
>  
>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 4f4360dd149e..b1190366242b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -37,7 +37,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -#define KVM_VCPU_MAX_FEATURES 7
> +#define KVM_VCPU_MAX_FEATURES 8
>  
>  #define KVM_REQ_SLEEP \
>   KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> @@ -110,6 +110,9 @@ struct kvm_arch {
>* supported.
>*/
>   bool return_nisv_io_abort_to_user;
> +
> + /* If any VCPU has MTE enabled then all memory must be MTE enabled */
> + bool vcpu_has_mte;

It looks like this is unnecessary as it's only used once, where a feature
check could be used.

>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index ba85bb23f060..2677e1ab8c16 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_MTE 7 /* VCPU supports Memory Tagging */
>  
>  struct kvm_vcpu_init {
>   __u32 target;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ba00bcc0c884..e8891bacd76f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   if (vma_pagesize == PAGE_SIZE && !force_pte)
>   vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  , _ipa);
> + if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
> + /*
> +  * VM will be able to see the page's tags, so we must ensure
> +  * they have been initialised.
> +  */
> + struct page *page = pfn_to_page(pfn);
> + long i, nr_pages = compound_nr(page);
> +
> + /* if PG_mte_tagged is set, tags have already been initialised 
> */
> + for (i = 0; i < nr_pages; i++, page++) {
> + if (!test_and_set_bit(PG_mte_tagged, >flags))
> + mte_clear_page_tags(page_address(page));
> + }
> + }
> +
>   if (writable)
>   kvm_set_pfn_dirty(pfn);
>  
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ee33875c5c2a..82f3883d717f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   }
>   }
>  
> + if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
> + if (!system_supports_mte()) {
> + ret = -EINVAL;
> + goto out;
> + }
> + vcpu->kvm->arch.vcpu_has_mte = true;
> + }

We either need a KVM cap or a new CPU feature probing interface to avoid
making userspace try features one at a time. It's too bad that VCPU_INIT
doesn't clear all offending features from the feature set when returning
EINVAL, because then userspace could create a scratch VCPU with everything
it supports in order to see what KVM also supports in one go.

> +
>   switch (vcpu->arch.target) {
>   default:
>   if 

Re: [PATCH 3/3] KVM: arm64: Use kvm_write_guest_lock when init stolen time

2020-08-17 Thread Andrew Jones
On Mon, Aug 17, 2020 at 11:37:29AM +0800, Keqian Zhu wrote:
> There is a lock version kvm_write_guest. Use it to simplify code.
> 
> Signed-off-by: Keqian Zhu 
> ---
>  arch/arm64/kvm/pvtime.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
> index f7b52ce..2b24e7f 100644
> --- a/arch/arm64/kvm/pvtime.c
> +++ b/arch/arm64/kvm/pvtime.c
> @@ -55,7 +55,6 @@ gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu)
>   struct pvclock_vcpu_stolen_time init_values = {};
>   struct kvm *kvm = vcpu->kvm;
>   u64 base = vcpu->arch.steal.base;
> - int idx;
>  
>   if (base == GPA_INVALID)
>   return base;
> @@ -66,10 +65,7 @@ gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu)
>*/
>   vcpu->arch.steal.steal = 0;
>   vcpu->arch.steal.last_steal = current->sched_info.run_delay;
> -
> - idx = srcu_read_lock(>srcu);
> - kvm_write_guest(kvm, base, _values, sizeof(init_values));
> - srcu_read_unlock(>srcu, idx);
> + kvm_write_guest_lock(kvm, base, _values, sizeof(init_values));
>  
>   return base;
>  }
> -- 
> 1.8.3.1
>

Reviewed-by: Andrew Jones 



Re: [PATCH 1/3] KVM: arm64: Some fixes of PV-time interface document

2020-08-17 Thread Andrew Jones
On Mon, Aug 17, 2020 at 11:37:27AM +0800, Keqian Zhu wrote:
> Rename PV_FEATURES tp PV_TIME_FEATURES
> 
> Signed-off-by: Keqian Zhu 
> ---
>  Documentation/virt/kvm/arm/pvtime.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/arm/pvtime.rst 
> b/Documentation/virt/kvm/arm/pvtime.rst
> index 687b60d..94bffe2 100644
> --- a/Documentation/virt/kvm/arm/pvtime.rst
> +++ b/Documentation/virt/kvm/arm/pvtime.rst
> @@ -3,7 +3,7 @@
>  Paravirtualized time support for arm64
>  ==
>  
> -Arm specification DEN0057/A defines a standard for paravirtualised time
> +Arm specification DEN0057/A defines a standard for paravirtualized time
>  support for AArch64 guests:
>  
>  https://developer.arm.com/docs/den0057/a
> @@ -19,8 +19,8 @@ Two new SMCCC compatible hypercalls are defined:
>  
>  These are only available in the SMC64/HVC64 calling convention as
>  paravirtualized time is not available to 32 bit Arm guests. The existence of
> -the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
> -mechanism before calling it.
> +the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1
> +ARCH_FEATURES mechanism before calling it.
>  
>  PV_TIME_FEATURES
>  = ==
> -- 
> 1.8.3.1
>

Reviewed-by: Andrew Jones 



Re: [PATCH 2/3] KVM: x86: introduce KVM_MEM_PCI_HOLE memory

2020-08-05 Thread Andrew Jones
On Tue, Jul 28, 2020 at 04:37:40PM +0200, Vitaly Kuznetsov wrote:
> PCIe config space can (depending on the configuration) be quite big but
> usually is sparsely populated. Guest may scan it by accessing individual
> device's page which, when device is missing, is supposed to have 'pci
> hole' semantics: reads return '0xff' and writes get discarded. Compared
> to the already existing KVM_MEM_READONLY, VMM doesn't need to allocate
> real memory and stuff it with '0xff'.
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  Documentation/virt/kvm/api.rst  | 19 +++-
>  arch/x86/include/uapi/asm/kvm.h |  1 +
>  arch/x86/kvm/mmu/mmu.c  |  5 -
>  arch/x86/kvm/mmu/paging_tmpl.h  |  3 +++
>  arch/x86/kvm/x86.c  | 10 ++---
>  include/linux/kvm_host.h|  7 +-
>  include/uapi/linux/kvm.h|  3 ++-
>  virt/kvm/kvm_main.c | 39 +++--
>  8 files changed, 68 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 644e5326aa50..fbbf533a331b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1241,6 +1241,7 @@ yet and must be cleared on entry.
>/* for kvm_memory_region::flags */
>#define KVM_MEM_LOG_DIRTY_PAGES(1UL << 0)
>#define KVM_MEM_READONLY   (1UL << 1)
> +  #define KVM_MEM_PCI_HOLE   (1UL << 2)
>  
>  This ioctl allows the user to create, modify or delete a guest physical
>  memory slot.  Bits 0-15 of "slot" specify the slot id and this value
> @@ -1268,12 +1269,18 @@ It is recommended that the lower 21 bits of 
> guest_phys_addr and userspace_addr
>  be identical.  This allows large pages in the guest to be backed by large
>  pages in the host.
>  
> -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
> -KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
> -writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
> -use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
> -to make a new slot read-only.  In this case, writes to this memory will be
> -posted to userspace as KVM_EXIT_MMIO exits.
> +The flags field supports the following flags: KVM_MEM_LOG_DIRTY_PAGES,
> +KVM_MEM_READONLY, KVM_MEM_READONLY:

The second KVM_MEM_READONLY should be KVM_MEM_PCI_HOLE. Or just drop the
list here, as they're listed below anyway

> +- KVM_MEM_LOG_DIRTY_PAGES can be set to instruct KVM to keep track of writes 
> to
> +  memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to use it.
> +- KVM_MEM_READONLY can be set, if KVM_CAP_READONLY_MEM capability allows it,
> +  to make a new slot read-only.  In this case, writes to this memory will be
> +  posted to userspace as KVM_EXIT_MMIO exits.
> +- KVM_MEM_PCI_HOLE can be set, if KVM_CAP_PCI_HOLE_MEM capability allows it,
> +  to create a new virtual read-only slot which will always return '0xff' when
> +  guest reads from it. 'userspace_addr' has to be set to NULL. This flag is
> +  mutually exclusive with KVM_MEM_LOG_DIRTY_PAGES/KVM_MEM_READONLY. All 
> writes
> +  to this memory will be posted to userspace as KVM_EXIT_MMIO exits.

I see 2/3's of this text is copy+pasted from above, but how about this

 - KVM_MEM_LOG_DIRTY_PAGES: log writes.  Use KVM_GET_DIRTY_LOG to retreive
   the log.
 - KVM_MEM_READONLY: exit to userspace with KVM_EXIT_MMIO on writes.  Only
   available when KVM_CAP_READONLY_MEM is present.
 - KVM_MEM_PCI_HOLE: always return 0xff on reads, exit to userspace with
   KVM_EXIT_MMIO on writes.  Only available when KVM_CAP_PCI_HOLE_MEM is
   present.  When setting the memory region 'userspace_addr' must be NULL.
   This flag is mutually exclusive with KVM_MEM_LOG_DIRTY_PAGES and with
   KVM_MEM_READONLY.

>  
>  When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
>  the memory region are automatically reflected into the guest.  For example, 
> an
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 17c5a038f42d..cf80a26d74f5 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -48,6 +48,7 @@
>  #define __KVM_HAVE_XSAVE
>  #define __KVM_HAVE_XCRS
>  #define __KVM_HAVE_READONLY_MEM
> +#define __KVM_HAVE_PCI_HOLE_MEM
>  
>  /* Architectural interrupt line count. */
>  #define KVM_NR_INTERRUPTS 256
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 8597e8102636..c2e3a1deafdd 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3253,7 +3253,7 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu 
> *vcpu, gfn_t gfn,
>   return PG_LEVEL_4K;
>  
>   slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true);
> - if (!slot)
> + if (!slot || (slot->flags & KVM_MEM_PCI_HOLE))
>   return PG_LEVEL_4K;
>  
>   max_level = min(max_level, max_page_level);
> @@ -4104,6 

Re: [PATCH v6 10/21] RISC-V: KVM: Handle MMIO exits for VCPU

2019-09-03 Thread Andrew Jones
On Thu, Aug 29, 2019 at 01:56:18PM +, Anup Patel wrote:
>  int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> - /* TODO: */
> + u8 data8;
> + u16 data16;
> + u32 data32;
> + u64 data64;
> + ulong insn;
> + int len, shift;
> +
> + insn = vcpu->arch.mmio_decode.insn;
> +
> + if (run->mmio.is_write)
> + goto done;
> +
> + len = vcpu->arch.mmio_decode.len;
> + shift = vcpu->arch.mmio_decode.shift;
> +
> + switch (len) {
> + case 1:
> + data8 = *((u8 *)run->mmio.data);
> + SET_RD(insn, >arch.guest_context,
> + (ulong)data8 << shift >> shift);
> + break;
> + case 2:
> + data16 = *((u16 *)run->mmio.data);
> + SET_RD(insn, >arch.guest_context,
> + (ulong)data16 << shift >> shift);
> + break;
> + case 4:
> + data32 = *((u32 *)run->mmio.data);
> + SET_RD(insn, >arch.guest_context,
> + (ulong)data32 << shift >> shift);
> + break;
> + case 8:
> + data64 = *((u64 *)run->mmio.data);
> + SET_RD(insn, >arch.guest_context,
> + (ulong)data64 << shift >> shift);
> + break;
> + default:
> + return -ENOTSUPP;
> + };
> +
> +done:
> + /* Move to next instruction */
> + vcpu->arch.guest_context.sepc += INSN_LEN(insn);
> +

As I pointed out in the last review, just moving this instruction skip
here is not enough. Doing so introduces the same problem that 2113c5f62b74
("KVM: arm/arm64: Only skip MMIO insn once") fixes for arm.

Thanks,
drew


Re: [PATCH v4 10/10] arm64: Retrieve stolen time as paravirtualized guest

2019-09-03 Thread Andrew Jones
On Fri, Aug 30, 2019 at 09:42:55AM +0100, Steven Price wrote:
> Enable paravirtualization features when running under a hypervisor
> supporting the PV_TIME_ST hypercall.
> 
> For each (v)CPU, we ask the hypervisor for the location of a shared
> page which the hypervisor will use to report stolen time to us. We set
> pv_time_ops to the stolen time function which simply reads the stolen
> value from the shared page for a VCPU. We guarantee single-copy
> atomicity using READ_ONCE which means we can also read the stolen
> time for another VCPU than the currently running one while it is
> potentially being updated by the hypervisor.
> 
> Signed-off-by: Steven Price 
> ---
>  arch/arm64/include/asm/paravirt.h |   9 +-
>  arch/arm64/kernel/paravirt.c  | 148 ++
>  arch/arm64/kernel/time.c  |   3 +
>  include/linux/cpuhotplug.h|   1 +
>  4 files changed, 160 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/paravirt.h 
> b/arch/arm64/include/asm/paravirt.h
> index 799d9dd6f7cc..125c26c42902 100644
> --- a/arch/arm64/include/asm/paravirt.h
> +++ b/arch/arm64/include/asm/paravirt.h
> @@ -21,6 +21,13 @@ static inline u64 paravirt_steal_clock(int cpu)
>  {
>   return pv_ops.time.steal_clock(cpu);
>  }
> -#endif
> +
> +int __init kvm_guest_init(void);
> +
> +#else
> +
> +#define kvm_guest_init()
> +
> +#endif // CONFIG_PARAVIRT
>  
>  #endif
> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> index 4cfed91fe256..5bf3be7ccf7e 100644
> --- a/arch/arm64/kernel/paravirt.c
> +++ b/arch/arm64/kernel/paravirt.c
> @@ -6,13 +6,161 @@
>   * Author: Stefano Stabellini 
>   */
>  
> +#define pr_fmt(fmt) "kvmarm-pv: " fmt
> +
> +#include 
> +#include 
>  #include 
> +#include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  #include 
> +
>  #include 
> +#include 
> +#include 
>  
>  struct static_key paravirt_steal_enabled;
>  struct static_key paravirt_steal_rq_enabled;
>  
>  struct paravirt_patch_template pv_ops;
>  EXPORT_SYMBOL_GPL(pv_ops);
> +
> +struct kvmarm_stolen_time_region {
> + struct pvclock_vcpu_stolen_time *kaddr;
> +};
> +
> +static DEFINE_PER_CPU(struct kvmarm_stolen_time_region, stolen_time_region);
> +
> +static bool steal_acc = true;
> +static int __init parse_no_stealacc(char *arg)
> +{
> + steal_acc = false;
> + return 0;
> +}
> +
> +early_param("no-steal-acc", parse_no_stealacc);

Need to also add an 'ARM64' to the
Documentation/admin-guide/kernel-parameters.txt entry for this.

Thanks,
drew


Re: [PATCH v4 00/10] arm64: Stolen time support

2019-09-03 Thread Andrew Jones
.c   | 124 
>  virt/kvm/kvm_main.c |   6 +-
>  29 files changed, 699 insertions(+), 154 deletions(-)
>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>  create mode 100644 arch/arm64/include/asm/pvclock-abi.h
>  create mode 100644 include/kvm/arm_hypercalls.h
>  create mode 100644 virt/kvm/arm/hypercalls.c
>  create mode 100644 virt/kvm/arm/pvtime.c
> 
> -- 
> 2.20.1
>

Hi Steven,

I had some fun testing this series with the KVM selftests framework. It
looks like it works to me, so you may add

Tested-by: Andrew Jones 

if you like. And below is the test I came up with.

Thanks,
drew


From: Andrew Jones 
Date: Tue, 3 Sep 2019 03:45:08 -0400
Subject: [PATCH] selftests: kvm: aarch64 stolen-time test

Signed-off-by: Andrew Jones 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/aarch64/stolen-time.c   | 208 ++
 2 files changed, 209 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/aarch64/stolen-time.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index ba7849751989..3151264039ad 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -28,6 +28,7 @@ TEST_GEN_PROGS_x86_64 += clear_dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 
+TEST_GEN_PROGS_aarch64 += aarch64/stolen-time
 TEST_GEN_PROGS_aarch64 += clear_dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
diff --git a/tools/testing/selftests/kvm/aarch64/stolen-time.c 
b/tools/testing/selftests/kvm/aarch64/stolen-time.c
new file mode 100644
index ..36df2f6baa17
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/stolen-time.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AArch64 PV stolen time test
+ *
+ * Copyright (C) 2019, Red Hat, Inc.
+ */
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "kvm_util.h"
+
+#define ST_IPA_BASE(1 << 30)
+#define MIN_STOLEN_TIME20
+
+struct st_time {
+   uint32_t rev;
+   uint32_t attr;
+   uint64_t st_time;
+};
+
+static uint64_t st_ipa_offset[4];
+static uint64_t guest_stolen_time[4];
+
+static void guest_code(void)
+{
+   struct st_time *st_time;
+   uint64_t cpu;
+   int64_t ipa;
+   int32_t ret;
+
+   asm volatile("mrs %0, mpidr_el1" : "=r" (cpu));
+   cpu &= 0x3;
+
+   asm volatile(
+   "movx0, %1\n"
+   "movx1, %2\n"
+   "hvc#0\n"
+   "mov%0, x0\n"
+   : "=r" (ret) : "r" (0x8001), "r" (0xc520) :
+ "x0", "x1", "x2", "x3");
+
+   GUEST_ASSERT(ret == 0);
+
+   asm volatile(
+   "movx0, %1\n"
+   "movx1, %2\n"
+   "hvc#0\n"
+   "mov%0, x0\n"
+   : "=r" (ret) : "r" (0xc520), "r" (0xc522) :
+ "x0", "x1", "x2", "x3");
+
+   GUEST_ASSERT(ret == 0);
+
+   asm volatile(
+   "movx0, %1\n"
+   "hvc#0\n"
+   "mov%0, x0\n"
+   : "=r" (ipa) : "r" (0xc522) :
+ "x0", "x1", "x2", "x3");
+
+   GUEST_ASSERT(ipa == ST_IPA_BASE + st_ipa_offset[cpu]);
+
+   st_time = (struct st_time *)ipa;
+   GUEST_ASSERT(st_time->rev == 0);
+   GUEST_ASSERT(st_time->attr == 0);
+
+   guest_stolen_time[cpu] = st_time->st_time;
+   GUEST_SYNC(0);
+
+   guest_stolen_time[cpu] = st_time->st_time;
+   GUEST_DONE();
+}
+
+static long get_run_delay(void)
+{
+   char path[64];
+   long val[2];
+   FILE *fp;
+
+   sprintf(path, "/proc/%ld/schedstat", syscall(SYS_gettid));
+   fp = fopen(path, "r");
+   fscanf(fp, "%ld %ld ", [0], [1]);
+   fclose(fp);
+
+   return val[1];
+}
+
+static void *steal_time(void *arg)
+{
+   uint64_t nsecs_per_sec = 10ul;
+   uint64_t sec, nsec;
+   struct timespec ts;
+
+   clock_gettime(CLOCK_MONOTONIC, );
+   sec = ts.tv_sec;
+   nsec = ts.tv_nsec + MIN_STOLEN_TIME;
+   if (nsec > nsecs_per_sec) {
+   sec += 1;
+   nsec -= nsecs_per_sec;
+   }
+
+   while (1) {
+   clock_gettime(CLOCK_MONOTONIC, );
+   if (ts.tv_sec > sec || (ts.tv_sec == sec && ts.tv_nsec >= nsec))
+   break;
+   }
+
+   return NULL;
+}
+
+static void r

Re: [PATCH v4 01/10] KVM: arm64: Document PV-time interface

2019-09-02 Thread Andrew Jones
On Fri, Aug 30, 2019 at 04:25:08PM +0100, Steven Price wrote:
> On 30/08/2019 15:47, Andrew Jones wrote:
> > On Fri, Aug 30, 2019 at 09:42:46AM +0100, Steven Price wrote:
> >> Introduce a paravirtualization interface for KVM/arm64 based on the
> >> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
> >>
> >> This only adds the details about "Stolen Time" as the details of "Live
> >> Physical Time" have not been fully agreed.
> >>
> >> User space can specify a reserved area of memory for the guest and
> >> inform KVM to populate the memory with information on time that the host
> >> kernel has stolen from the guest.
> >>
> >> A hypercall interface is provided for the guest to interrogate the
> >> hypervisor's support for this interface and the location of the shared
> >> memory structures.
> >>
> >> Signed-off-by: Steven Price 
> >> ---
> >>  Documentation/virt/kvm/arm/pvtime.txt   | 64 +
> >>  Documentation/virt/kvm/devices/vcpu.txt | 14 ++
> >>  2 files changed, 78 insertions(+)
> >>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
> >>
> >> diff --git a/Documentation/virt/kvm/arm/pvtime.txt 
> >> b/Documentation/virt/kvm/arm/pvtime.txt
> >> new file mode 100644
> >> index ..dda3f0f855b9
> >> --- /dev/null
> >> +++ b/Documentation/virt/kvm/arm/pvtime.txt
> >> @@ -0,0 +1,64 @@
> >> +Paravirtualized time support for arm64
> >> +==
> >> +
> >> +Arm specification DEN0057/A defined a standard for paravirtualised time
> >> +support for AArch64 guests:
> >> +
> >> +https://developer.arm.com/docs/den0057/a
> >> +
> >> +KVM/arm64 implements the stolen time part of this specification by 
> >> providing
> >> +some hypervisor service calls to support a paravirtualized guest 
> >> obtaining a
> >> +view of the amount of time stolen from its execution.
> >> +
> >> +Two new SMCCC compatible hypercalls are defined:
> >> +
> >> +PV_FEATURES 0xC520
> >> +PV_TIME_ST  0xC522
> >> +
> >> +These are only available in the SMC64/HVC64 calling convention as
> >> +paravirtualized time is not available to 32 bit Arm guests. The existence 
> >> of
> >> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 
> >> ARCH_FEATURES
> >> +mechanism before calling it.
> >> +
> >> +PV_FEATURES
> >> +Function ID:  (uint32)  : 0xC520
> >> +PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
> > 
> > PV_TIME_LPT doesn't exist
> 
> Thanks, will remove.
> 
> >> +Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the 
> >> relevant
> >> +  PV-time feature is supported by the 
> >> hypervisor.
> >> +
> >> +PV_TIME_ST
> >> +Function ID:  (uint32)  : 0xC522
> >> +Return value: (int64)   : IPA of the stolen time data structure for 
> >> this
> >> +  VCPU. On failure:
> >> +  NOT_SUPPORTED (-1)
> >> +
> >> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal 
> >> memory
> >> +with inner and outer write back caching attributes, in the inner shareable
> >> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
> >> +meaningfully filled by the hypervisor (see structure below).
> >> +
> >> +PV_TIME_ST returns the structure for the calling VCPU.
> >> +
> >> +Stolen Time
> >> +---
> >> +
> >> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
> >> +
> >> +  Field   | Byte Length | Byte Offset | Description
> >> +  --- | --- | --- | --
> >> +  Revision|  4  |  0  | Must be 0 for version 0.1
> >> +  Attributes  |  4  |  4  | Must be 0
> > 
> > The above fields don't appear to be exposed to userspace in anyway. How
> > will we handle migration from one KVM with one version of the structure
> > to another?
> 
> Interesting question. User space does have access to them now it is
> providing the memory, but it's not exactly an easy method. In particular
> user space has no (simple) way of probing the kernel's

Re: [PATCH v2 0/4] KVM: selftests: Introduce VM_MODE_PXXV48_4K

2019-08-29 Thread Andrew Jones
On Thu, Aug 29, 2019 at 10:21:13AM +0800, Peter Xu wrote:
> v2:
> - pick r-bs
> - rebased to master
> - fix pa width detect, check cpuid(1):edx.PAE(bit 6)
> - fix arm compilation issue [Drew]
> - fix indents issues and ways to define macros [Drew]
> - provide functions for fetching cpu pa/va bits [Drew]
> 
> This series originates from "[PATCH] KVM: selftests: Detect max PA
> width from cpuid" [1] and one of Drew's comments - instead of keeping
> the hackish line to overwrite guest_pa_bits all the time, this series
> introduced the new mode VM_MODE_PXXV48_4K for x86_64 platform.
> 
> The major issue is that even all the x86_64 kvm selftests are
> currently using the guest mode VM_MODE_P52V48_4K, many x86_64 hosts
> are not using 52 bits PA (and in most cases, far less).  If with luck
> we could be having 48 bits hosts, but it's more adhoc (I've observed 3
> x86_64 systems, they are having different PA width of 36, 39, 48).  I
> am not sure whether this is happening to the other archs as well, but
> it probably makes sense to bring the x86_64 tests to the real world on
> always using the correct PA bits.
> 
> A side effect of this series is that it will also fix the crash we've
> encountered on Xeon E3-1220 as mentioned [1] due to the
> differenciation of PA width.
> 
> With [1], we've observed AMD host issues when with NPT=off.  However a
> funny fact is that after I reworked into this series, the tests can
> instead pass on both NPT=on/off.  It could be that the series changes
> vm->pa_bits or other fields so something was affected.  I didn't dig
> more on that though, considering we should not lose anything.
> 
> [1] https://lkml.org/lkml/2019/8/26/141
> 
> Peter Xu (4):
>   KVM: selftests: Move vm type into _vm_create() internally
>   KVM: selftests: Create VM earlier for dirty log test
>   KVM: selftests: Introduce VM_MODE_PXXV48_4K
>   KVM: selftests: Remove duplicate guest mode handling
> 
>  tools/testing/selftests/kvm/dirty_log_test.c  | 79 +--
>  .../testing/selftests/kvm/include/kvm_util.h  | 16 +++-
>  .../selftests/kvm/include/x86_64/processor.h  |  3 +
>  .../selftests/kvm/lib/aarch64/processor.c |  3 +
>  tools/testing/selftests/kvm/lib/kvm_util.c| 67 
>  .../selftests/kvm/lib/x86_64/processor.c  | 30 ++-
>  6 files changed, 119 insertions(+), 79 deletions(-)
> 
> -- 
> 2.21.0
>

Tested on AArch64. Looks good.

Thanks,
drew


Re: [PATCH v2 1/4] KVM: selftests: Move vm type into _vm_create() internally

2019-08-29 Thread Andrew Jones
On Thu, Aug 29, 2019 at 10:21:14AM +0800, Peter Xu wrote:
> Rather than passing the vm type from the top level to the end of vm
> creation, let's simply keep that as an internal of kvm_vm struct and
> decide the type in _vm_create().  Several reasons for doing this:
> 
> - The vm type is only decided by physical address width and currently
>   only used in aarch64, so we've got enough information as long as
>   we're passing vm_guest_mode into _vm_create(),
> 
> - This removes a loop dependency between the vm->type and creation of
>   vms.  That's why now we need to parse vm_guest_mode twice sometimes,
>   once in run_test() and then again in _vm_create().  The follow up
>   patches will move on to clean up that as well so we can have a
>   single place to decide guest machine types and so.
> 
> Note that this patch will slightly change the behavior of aarch64
> tests in that previously most vm_create() callers will directly pass
> in type==0 into _vm_create() but now the type will depend on
> vm_guest_mode, however it shouldn't affect any user because all
> vm_create() users of aarch64 will be using VM_MODE_DEFAULT guest
> mode (which is VM_MODE_P40V48_4K) so at last type will still be zero.
> 
> Signed-off-by: Peter Xu 
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c  | 13 +++-
>  .../testing/selftests/kvm/include/kvm_util.h  |  3 +--
>  tools/testing/selftests/kvm/lib/kvm_util.c| 21 ++++-------
>  3 files changed, 17 insertions(+), 20 deletions(-)
>

Reviewed-by: Andrew Jones 


Re: [PATCH v2 3/4] KVM: selftests: Introduce VM_MODE_PXXV48_4K

2019-08-29 Thread Andrew Jones
On Thu, Aug 29, 2019 at 06:03:09PM +0800, Peter Xu wrote:
> On Thu, Aug 29, 2019 at 11:45:16AM +0200, Andrew Jones wrote:
> > On Thu, Aug 29, 2019 at 10:21:16AM +0800, Peter Xu wrote:
> > > The naming VM_MODE_P52V48_4K is explicit but unclear when used on
> > > x86_64 machines, because x86_64 machines are having various physical
> > > address width rather than some static values.  Here's some examples:
> > > 
> > >   - Intel Xeon E3-1220:  36 bits
> > >   - Intel Core i7-8650:  39 bits
> > >   - AMD   EPYC 7251: 48 bits
> > > 
> > > All of them are using 48 bits linear address width but with totally
> > > different physical address width (and most of the old machines should
> > > be less than 52 bits).
> > > 
> > > Let's create a new guest mode called VM_MODE_PXXV48_4K for current
> > > x86_64 tests and make it as the default to replace the old naming of
> > > VM_MODE_P52V48_4K because it shows more clearly that the PA width is
> > > not really a constant.  Meanwhile we also stop assuming all the x86
> > > machines are having 52 bits PA width but instead we fetch the real
> > > vm->pa_bits from CPUID 0x8008 during runtime.
> > > 
> > > We currently make this exclusively used by x86_64 but no other arch.
> > > 
> > > As a slight touch up, moving DEBUG macro from dirty_log_test.c to
> > > kvm_util.h so lib can use it too.
> > > 
> > > Signed-off-by: Peter Xu 
> > > ---
> > >  tools/testing/selftests/kvm/dirty_log_test.c  |  5 ++--
> > >  .../testing/selftests/kvm/include/kvm_util.h  |  9 +-
> > >  .../selftests/kvm/include/x86_64/processor.h  |  3 ++
> > >  .../selftests/kvm/lib/aarch64/processor.c |  3 ++
> > >  tools/testing/selftests/kvm/lib/kvm_util.c| 29 ++
> > >  .../selftests/kvm/lib/x86_64/processor.c  | 30 ---
> > >  6 files changed, 65 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> > > b/tools/testing/selftests/kvm/dirty_log_test.c
> > > index efb7746a7e99..c86f83cb33e5 100644
> > > --- a/tools/testing/selftests/kvm/dirty_log_test.c
> > > +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> > > @@ -19,8 +19,6 @@
> > >  #include "kvm_util.h"
> > >  #include "processor.h"
> > >  
> > > -#define DEBUG printf
> > > -
> > >  #define VCPU_ID  1
> > >  
> > >  /* The memory slot index to track dirty pages */
> > > @@ -256,6 +254,7 @@ static void run_test(enum vm_guest_mode mode, 
> > > unsigned long iterations,
> > >  
> > >   switch (mode) {
> > >   case VM_MODE_P52V48_4K:
> > > + case VM_MODE_PXXV48_4K:
> > >   guest_pa_bits = 52;
> > >   guest_page_shift = 12;
> > >   break;
> > > @@ -446,7 +445,7 @@ int main(int argc, char *argv[])
> > >  #endif
> > >  
> > >  #ifdef __x86_64__
> > > - vm_guest_mode_params_init(VM_MODE_P52V48_4K, true, true);
> > > + vm_guest_mode_params_init(VM_MODE_PXXV48_4K, true, true);
> > >  #endif
> > >  #ifdef __aarch64__
> > >   vm_guest_mode_params_init(VM_MODE_P40V48_4K, true, true);
> > > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> > > b/tools/testing/selftests/kvm/include/kvm_util.h
> > > index c78faa2ff7f3..430edbacb9b2 100644
> > > --- a/tools/testing/selftests/kvm/include/kvm_util.h
> > > +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> > > @@ -24,6 +24,10 @@ struct kvm_vm;
> > >  typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) physical address 
> > > */
> > >  typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address 
> > > */
> > >  
> > > +#ifndef DEBUG
> > > +#define DEBUG printf
> > > +#endif
> > 
> > There's no way to turn this off without modifying code. I suggested
> > 
> > #ifndef NDEBUG
> > #define dprintf printf
> > #endif
> > 
> > which allows the dprintf(...) statements to be removed by compiling with
> > -DNDEBUG added to CFLAGS. And that would also disable all the asserts().
> > That's probably not all that useful, but then again, defining printf() as
> > DEBUG() isn't useful either if the intention is to always print.
> 
> Sorry I misread that...
> 
> Though, I'm afraid even if with above it won't compile with -DNDEBUG
> because the compiler could start to complain about undefined
> "dprintf", or even recognize the dprintf as the libc call, dprintf(3).
> 
> So instead, does below looks ok?
> 
> #ifdef NDEBUG
> #define DEBUG(...)
> #else
> #define DEBUG(...) printf(__VA_ARGS__);
> #endif

yeah, that's what I was looking for, but I wasn't thinking clearly when
I suggested just the name redefinition.

drew


Re: [PATCH v2 3/4] KVM: selftests: Introduce VM_MODE_PXXV48_4K

2019-08-29 Thread Andrew Jones
On Thu, Aug 29, 2019 at 10:21:16AM +0800, Peter Xu wrote:
> The naming VM_MODE_P52V48_4K is explicit but unclear when used on
> x86_64 machines, because x86_64 machines are having various physical
> address width rather than some static values.  Here's some examples:
> 
>   - Intel Xeon E3-1220:  36 bits
>   - Intel Core i7-8650:  39 bits
>   - AMD   EPYC 7251: 48 bits
> 
> All of them are using 48 bits linear address width but with totally
> different physical address width (and most of the old machines should
> be less than 52 bits).
> 
> Let's create a new guest mode called VM_MODE_PXXV48_4K for current
> x86_64 tests and make it as the default to replace the old naming of
> VM_MODE_P52V48_4K because it shows more clearly that the PA width is
> not really a constant.  Meanwhile we also stop assuming all the x86
> machines are having 52 bits PA width but instead we fetch the real
> vm->pa_bits from CPUID 0x8008 during runtime.
> 
> We currently make this exclusively used by x86_64 but no other arch.
> 
> As a slight touch up, moving DEBUG macro from dirty_log_test.c to
> kvm_util.h so lib can use it too.
> 
> Signed-off-by: Peter Xu 
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c  |  5 ++--
>  .../testing/selftests/kvm/include/kvm_util.h  |  9 +-
>  .../selftests/kvm/include/x86_64/processor.h  |  3 ++
>  .../selftests/kvm/lib/aarch64/processor.c |  3 ++
>  tools/testing/selftests/kvm/lib/kvm_util.c| 29 ++
>  .../selftests/kvm/lib/x86_64/processor.c  | 30 ---
>  6 files changed, 65 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index efb7746a7e99..c86f83cb33e5 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -19,8 +19,6 @@
>  #include "kvm_util.h"
>  #include "processor.h"
>  
> -#define DEBUG printf
> -
>  #define VCPU_ID  1
>  
>  /* The memory slot index to track dirty pages */
> @@ -256,6 +254,7 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>  
>   switch (mode) {
>   case VM_MODE_P52V48_4K:
> + case VM_MODE_PXXV48_4K:
>   guest_pa_bits = 52;
>   guest_page_shift = 12;
>   break;
> @@ -446,7 +445,7 @@ int main(int argc, char *argv[])
>  #endif
>  
>  #ifdef __x86_64__
> - vm_guest_mode_params_init(VM_MODE_P52V48_4K, true, true);
> + vm_guest_mode_params_init(VM_MODE_PXXV48_4K, true, true);
>  #endif
>  #ifdef __aarch64__
>   vm_guest_mode_params_init(VM_MODE_P40V48_4K, true, true);
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index c78faa2ff7f3..430edbacb9b2 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -24,6 +24,10 @@ struct kvm_vm;
>  typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) physical address */
>  typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */
>  
> +#ifndef DEBUG
> +#define DEBUG printf
> +#endif

There's no way to turn this off without modifying code. I suggested

#ifndef NDEBUG
#define dprintf printf
#endif

which allows the dprintf(...) statements to be removed by compiling with
-DNDEBUG added to CFLAGS. And that would also disable all the asserts().
That's probably not all that useful, but then again, defining printf() as
DEBUG() isn't useful either if the intention is to always print.

> +
>  /* Minimum allocated guest virtual and physical addresses */
>  #define KVM_UTIL_MIN_VADDR   0x2000
>  
> @@ -38,11 +42,14 @@ enum vm_guest_mode {
>   VM_MODE_P48V48_64K,
>   VM_MODE_P40V48_4K,
>   VM_MODE_P40V48_64K,
> + VM_MODE_PXXV48_4K,  /* For 48bits VA but ANY bits PA */
>   NUM_VM_MODES,
>  };
>  
> -#ifdef __aarch64__
> +#if defined(__aarch64__)
>  #define VM_MODE_DEFAULT VM_MODE_P40V48_4K
> +#elif defined(__x86_64__)
> +#define VM_MODE_DEFAULT VM_MODE_PXXV48_4K
>  #else
>  #define VM_MODE_DEFAULT VM_MODE_P52V48_4K
>  #endif
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h 
> b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 80d19740d2dc..0c17f2ee685e 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -325,6 +325,9 @@ uint64_t vcpu_get_msr(struct kvm_vm *vm, uint32_t vcpuid, 
> uint64_t msr_index);
>  void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
> uint64_t msr_value);
>  
> +uint32_t kvm_get_cpuid_max(void);
> +void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
> +
>  /*
>   * Basic CPU control in CR0
>   */
> diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c 
> b/tools/testing/selftests/kvm/lib/aarch64/processor.c
> index 

Re: [PATCH 4/4] KVM: selftests: Remove duplicate guest mode handling

2019-08-29 Thread Andrew Jones
On Thu, Aug 29, 2019 at 10:09:35AM +0800, Peter Xu wrote:
> On Wed, Aug 28, 2019 at 01:46:13PM +0200, Andrew Jones wrote:
> 
> [...]
> 
> > > +unsigned int vm_get_page_size(struct kvm_vm *vm)
> > > +{
> > > + return vm->page_size;
> > > +}
> > > +
> > > +unsigned int vm_get_page_shift(struct kvm_vm *vm)
> > > +{
> > > + return vm->page_shift;
> > > +}
> > 
> > We could get by with just one of the above two, but whatever
> 
> Right... and imho if we export kvm_vm struct we don't even any
> helpers. :) But I didn't touch that.

yeah, I'm starting to wonder if there's much value in keeping the vm and
vcpu structures private. I've already had a couple cases where I wanted
to write a quick+dirty test that needed the vcpu_fd, so I cheated and
included the internal header to get to it.

Thanks,
drew

> 
> > > +
> > > +unsigned int vm_get_max_gfn(struct kvm_vm *vm)
> > > +{
> > > + return vm->max_gfn;
> > > +}
> > > -- 
> > > 2.21.0
> > >
> > 
> > Reviewed-by: Andrew Jones 
> 
> Thanks!
> 
> -- 
> Peter Xu


Re: [PATCH 0/4] KVM: selftests: Introduce VM_MODE_PXXV48_4K

2019-08-28 Thread Andrew Jones
On Wed, Aug 28, 2019 at 01:51:06PM +0200, Andrew Jones wrote:
> On Tue, Aug 27, 2019 at 09:10:11PM +0800, Peter Xu wrote:
> > The work is based on Thomas's s390 port for dirty_log_test.
> > 
> > This series originates from "[PATCH] KVM: selftests: Detect max PA
> > width from cpuid" [1] and one of Drew's comments - instead of keeping
> > the hackish line to overwrite guest_pa_bits all the time, this series
> > introduced the new mode VM_MODE_PXXV48_4K for x86_64 platform.
> > 
> > The major issue is that even all the x86_64 kvm selftests are
> > currently using the guest mode VM_MODE_P52V48_4K, many x86_64 hosts
> > are not using 52 bits PA (and in most cases, far less).  If with luck
> > we could be having 48 bits hosts, but it's more adhoc (I've observed 3
> > x86_64 systems, they are having different PA width of 36, 39, 48).  I
> > am not sure whether this is happening to the other archs as well, but
> > it probably makes sense to bring the x86_64 tests to the real world on
> > always using the correct PA bits.
> > 
> > A side effect of this series is that it will also fix the crash we've
> > encountered on Xeon E3-1220 as mentioned [1] due to the
> > differenciation of PA width.
> > 
> > With [1], we've observed AMD host issues when with NPT=off.  However a
> > funny fact is that after I reworked into this series, the tests can
> > instead pass on both NPT=on/off.  It could be that the series changes
> > vm->pa_bits or other fields so something was affected.  I didn't dig
> > more on that though, considering we should not lose anything.
> > 
> > Any kind of smoke test would be greatly welcomed (especially on s390
> > or ARM).  Same to comments.  Thanks,
> > 
> 
> The patches didn't apply cleanly for me on 9e8312f5e160, but once I got
> them applied I was able to run the aarch64 tests.

Oh, and after fixing 2/4 (vm->pa_bits) to fix compilation on aarch64 as
pointed out on that patch.

> 
> > [1] https://lkml.org/lkml/2019/8/26/141
> > 
> > Peter Xu (4):
> >   KVM: selftests: Move vm type into _vm_create() internally
> >   KVM: selftests: Create VM earlier for dirty log test
> >   KVM: selftests: Introduce VM_MODE_PXXV48_4K
> >   KVM: selftests: Remove duplicate guest mode handling
> > 
> >  tools/testing/selftests/kvm/dirty_log_test.c  | 78 +--
> >  .../testing/selftests/kvm/include/kvm_util.h  | 17 +++-
> >  .../selftests/kvm/lib/aarch64/processor.c |  3 +
> >  tools/testing/selftests/kvm/lib/kvm_util.c| 77 ++
> >  .../selftests/kvm/lib/x86_64/processor.c  |  8 +-
> >  5 files changed, 107 insertions(+), 76 deletions(-)
> > 
> > -- 
> > 2.21.0
> >
> 
> Thanks,
> drew


Re: [PATCH 0/4] KVM: selftests: Introduce VM_MODE_PXXV48_4K

2019-08-28 Thread Andrew Jones
On Tue, Aug 27, 2019 at 09:10:11PM +0800, Peter Xu wrote:
> The work is based on Thomas's s390 port for dirty_log_test.
> 
> This series originates from "[PATCH] KVM: selftests: Detect max PA
> width from cpuid" [1] and one of Drew's comments - instead of keeping
> the hackish line to overwrite guest_pa_bits all the time, this series
> introduced the new mode VM_MODE_PXXV48_4K for x86_64 platform.
> 
> The major issue is that even all the x86_64 kvm selftests are
> currently using the guest mode VM_MODE_P52V48_4K, many x86_64 hosts
> are not using 52 bits PA (and in most cases, far less).  If with luck
> we could be having 48 bits hosts, but it's more adhoc (I've observed 3
> x86_64 systems, they are having different PA width of 36, 39, 48).  I
> am not sure whether this is happening to the other archs as well, but
> it probably makes sense to bring the x86_64 tests to the real world on
> always using the correct PA bits.
> 
> A side effect of this series is that it will also fix the crash we've
> encountered on Xeon E3-1220 as mentioned [1] due to the
> differenciation of PA width.
> 
> With [1], we've observed AMD host issues when with NPT=off.  However a
> funny fact is that after I reworked into this series, the tests can
> instead pass on both NPT=on/off.  It could be that the series changes
> vm->pa_bits or other fields so something was affected.  I didn't dig
> more on that though, considering we should not lose anything.
> 
> Any kind of smoke test would be greatly welcomed (especially on s390
> or ARM).  Same to comments.  Thanks,
> 

The patches didn't apply cleanly for me on 9e8312f5e160, but once I got
them applied I was able to run the aarch64 tests.

> [1] https://lkml.org/lkml/2019/8/26/141
> 
> Peter Xu (4):
>   KVM: selftests: Move vm type into _vm_create() internally
>   KVM: selftests: Create VM earlier for dirty log test
>   KVM: selftests: Introduce VM_MODE_PXXV48_4K
>   KVM: selftests: Remove duplicate guest mode handling
> 
>  tools/testing/selftests/kvm/dirty_log_test.c  | 78 +--
>  .../testing/selftests/kvm/include/kvm_util.h  | 17 +++-
>  .../selftests/kvm/lib/aarch64/processor.c |  3 +
>  tools/testing/selftests/kvm/lib/kvm_util.c| 77 ++
>  .../selftests/kvm/lib/x86_64/processor.c  |  8 +-
>  5 files changed, 107 insertions(+), 76 deletions(-)
> 
> -- 
> 2.21.0
>

Thanks,
drew


Re: [PATCH 2/4] KVM: selftests: Create VM earlier for dirty log test

2019-08-28 Thread Andrew Jones
On Tue, Aug 27, 2019 at 09:10:13PM +0800, Peter Xu wrote:
> Since we've just removed the dependency of vm type in previous patch,
> now we can create the vm much earlier.  Note that to move it earlier
> we used an approximation of number of extra pages but it should be
> fine.
> 
> This prepares for the follow up patches to finally remove the
> duplication of guest mode parsings.
> 
> Signed-off-by: Peter Xu 
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index 424efcf8c734..040952f3d4ad 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -264,6 +264,9 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, 
> uint32_t vcpuid,
>   return vm;
>  }
>  
> +#define DIRTY_MEM_BITS 30 /* 1G */
> +#define PAGE_SHIFT_4K  12
> +
>  static void run_test(enum vm_guest_mode mode, unsigned long iterations,
>unsigned long interval, uint64_t phys_offset)
>  {
> @@ -273,6 +276,18 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   uint64_t max_gfn;
>   unsigned long *bmap;
>  
> + /*
> +  * We reserve page table for 2 times of extra dirty mem which
> +  * will definitely cover the original (1G+) test range.  Here
> +  * we do the calculation with 4K page size which is the
> +  * smallest so the page number will be enough for all archs
> +  * (e.g., 64K page size guest will need even less memory for
> +  * page tables).
> +  */
> + vm = create_vm(mode, VCPU_ID,
> +2ul << (DIRTY_MEM_BITS - PAGE_SHIFT_4K),
> +guest_code);
> +
>   switch (mode) {
>   case VM_MODE_P52V48_4K:
>   guest_pa_bits = 52;
> @@ -319,7 +334,7 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>* A little more than 1G of guest page sized pages.  Cover the
>* case where the size is not aligned to 64 pages.
>*/
> - guest_num_pages = (1ul << (30 - guest_page_shift)) + 16;
> + guest_num_pages = (1ul << (DIRTY_MEM_BITS - guest_page_shift)) + 16;
>  #ifdef __s390x__
>   /* Round up to multiple of 1M (segment size) */
>   guest_num_pages = (guest_num_pages + 0xff) & ~0xffUL;
> @@ -345,8 +360,6 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   bmap = bitmap_alloc(host_num_pages);
>   host_bmap_track = bitmap_alloc(host_num_pages);
>  
> - vm = create_vm(mode, VCPU_ID, guest_num_pages, guest_code);
> -
>  #ifdef USE_CLEAR_DIRTY_LOG
>   struct kvm_enable_cap cap = {};
>  
> -- 
> 2.21.0
>

Reviewed-by: Andrew Jones 



Re: [PATCH 4/4] KVM: selftests: Remove duplicate guest mode handling

2019-08-28 Thread Andrew Jones
ft(struct kvm_vm *vm);
> +unsigned int vm_get_max_gfn(struct kvm_vm *vm);
> +
>  struct kvm_userspace_memory_region *
>  kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start,
>uint64_t end);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 8c6f872a8793..cf39643ff2c7 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -137,6 +137,8 @@ struct kvm_vm *_vm_create(enum vm_guest_mode mode, 
> uint64_t phy_pages,
>  {
>   struct kvm_vm *vm;
>  
> + DEBUG("Testing guest mode: %s\n", vm_guest_mode_string(mode));
> +
>   vm = calloc(1, sizeof(*vm));
>   TEST_ASSERT(vm != NULL, "Insufficient Memory");
>  
> @@ -1662,3 +1664,18 @@ bool vm_is_unrestricted_guest(struct kvm_vm *vm)
>  
>   return val == 'Y';
>  }
> +
> +unsigned int vm_get_page_size(struct kvm_vm *vm)
> +{
> + return vm->page_size;
> +}
> +
> +unsigned int vm_get_page_shift(struct kvm_vm *vm)
> +{
> + return vm->page_shift;
> +}

We could get by with just one of the above two, but whatever
> +
> +unsigned int vm_get_max_gfn(struct kvm_vm *vm)
> +{
> + return vm->max_gfn;
> +}
> -- 
> 2.21.0
>

Reviewed-by: Andrew Jones 


Re: [PATCH 3/4] KVM: selftests: Introduce VM_MODE_PXXV48_4K

2019-08-28 Thread Andrew Jones
On Tue, Aug 27, 2019 at 09:10:14PM +0800, Peter Xu wrote:
> The naming VM_MODE_P52V48_4K is explicit but unclear when used on
> x86_64 machines, because x86_64 machines are having various physical
> address width rather than some static values.  Here's some examples:
> 
>   - Intel Xeon E3-1220:  36 bits
>   - Intel Core i7-8650:  39 bits
>   - AMD   EPYC 7251: 48 bits
> 
> All of them are using 48 bits linear address width but with totally
> different physical address width (and most of the old machines should
> be less than 52 bits).
> 
> Let's create a new guest mode called VM_MODE_PXXV48_4K for current
> x86_64 tests and make it as the default to replace the old naming of
> VM_MODE_P52V48_4K because it shows more clearly that the PA width is
> not really a constant.  Meanwhile we also stop assuming all the x86
> machines are having 52 bits PA width but instead we fetch the real
> vm->pa_bits from CPUID 0x8008 during runtime.
> 
> We currently make this exclusively used by x86_64 but no other arch.
> 
> As a slight touch up, moving DEBUG macro from dirty_log_test.c to
> kvm_util.h so lib can use it too.
> 
> Signed-off-by: Peter Xu 
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c  |  5 +--
>  .../testing/selftests/kvm/include/kvm_util.h  | 11 -
>  .../selftests/kvm/lib/aarch64/processor.c |  3 ++
>  tools/testing/selftests/kvm/lib/kvm_util.c| 40 ---
>  .../selftests/kvm/lib/x86_64/processor.c  |  8 ++--
>  5 files changed, 53 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index 040952f3d4ad..b2e07a3173b2 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -19,8 +19,6 @@
>  #include "kvm_util.h"
>  #include "processor.h"
>  
> -#define DEBUG printf
> -
>  #define VCPU_ID  1
>  
>  /* The memory slot index to track dirty pages */
> @@ -290,6 +288,7 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>  
>   switch (mode) {
>   case VM_MODE_P52V48_4K:
> + case VM_MODE_PXXV48_4K:
>   guest_pa_bits = 52;
>   guest_page_shift = 12;
>   break;
> @@ -489,7 +488,7 @@ int main(int argc, char *argv[])
>  #endif
>  
>  #ifdef __x86_64__
> - vm_guest_mode_params_init(VM_MODE_P52V48_4K, true, true);
> + vm_guest_mode_params_init(VM_MODE_PXXV48_4K, true, true);
>  #endif
>  #ifdef __aarch64__
>   vm_guest_mode_params_init(VM_MODE_P40V48_4K, true, true);
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index cfc079f20815..1c700c6b31b5 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -24,6 +24,8 @@ struct kvm_vm;
>  typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) physical address */
>  typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */
>  
> +#define DEBUG printf

If this is going to be some general thing, then maybe we should do it
like this

#ifndef NDEBUG
#define dprintf printf
#endif


> +
>  /* Minimum allocated guest virtual and physical addresses */
>  #define KVM_UTIL_MIN_VADDR   0x2000
>  
> @@ -38,12 +40,19 @@ enum vm_guest_mode {
>   VM_MODE_P48V48_64K,
>   VM_MODE_P40V48_4K,
>   VM_MODE_P40V48_64K,
> + VM_MODE_PXXV48_4K,  /* For 48bits VA but ANY bits PA */
>   NUM_VM_MODES,
>  };
>  
>  #ifdef __aarch64__
>  #define VM_MODE_DEFAULT VM_MODE_P40V48_4K
> -#else
> +#endif
> +
> +#ifdef __x86_64__
> +#define VM_MODE_DEFAULT VM_MODE_PXXV48_4K
> +#endif
> +
> +#ifndef VM_MODE_DEFAULT
>  #define VM_MODE_DEFAULT VM_MODE_P52V48_4K
>  #endif

nit: how about

#if defined(__aarch64__)
...
#elif defined(__x86_64__)
...
#else
...
#endif

>  
> diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c 
> b/tools/testing/selftests/kvm/lib/aarch64/processor.c
> index 486400a97374..86036a59a668 100644
> --- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
> @@ -264,6 +264,9 @@ void aarch64_vcpu_setup(struct kvm_vm *vm, int vcpuid, 
> struct kvm_vcpu_init *ini
>   case VM_MODE_P52V48_4K:
>   TEST_ASSERT(false, "AArch64 does not support 4K sized pages "
>  "with 52-bit physical address ranges");
> + case VM_MODE_PXXV48_4K:
> + TEST_ASSERT(false, "AArch64 does not support 4K sized pages "
> +"with ANY-bit physical address ranges");
>   case VM_MODE_P52V48_64K:
>   tcr_el1 |= 1ul << 14; /* TG0 = 64KB */
>   tcr_el1 |= 6ul << 32; /* IPS = 52 bits */
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 0c7c4368bc14..8c6f872a8793 100644
> --- 

Re: [PATCH 1/4] KVM: selftests: Move vm type into _vm_create() internally

2019-08-28 Thread Andrew Jones
On Tue, Aug 27, 2019 at 09:10:12PM +0800, Peter Xu wrote:
> Rather than passing the vm type from the top level to the end of vm
> creation, let's simply keep that as an internal of kvm_vm struct and
> decide the type in _vm_create().  Several reasons for doing this:
> 
> - The vm type is only decided by physical address width and currently
>   only used in aarch64, so we've got enough information as long as
>   we're passing vm_guest_mode into _vm_create(),
> 
> - This removes a loop dependency between the vm->type and creation of
>   vms.  That's why now we need to parse vm_guest_mode twice sometimes,
>   once in run_test() and then again in _vm_create().  The follow up
>   patches will move on to clean up that as well so we can have a
>   single place to decide guest machine types and so.
> 
> Note that this patch will slightly change the behavior of aarch64
> tests in that previously most vm_create() callers will directly pass
> in type==0 into _vm_create() but now the type will depend on
> vm_guest_mode, however it shouldn't affect any user because all
> vm_create() users of aarch64 will be using VM_MODE_DEFAULT guest
> mode (which is VM_MODE_P40V48_4K) so at last type will still be zero.
> 
> Signed-off-by: Peter Xu 
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c  | 12 +++
>  .../testing/selftests/kvm/include/kvm_util.h  |  2 +-
>  tools/testing/selftests/kvm/lib/kvm_util.c| 20 ---
>  3 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index dc3346e090f5..424efcf8c734 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -249,14 +249,13 @@ static void vm_dirty_log_verify(unsigned long *bmap)
>  }
>  
>  static struct kvm_vm *create_vm(enum vm_guest_mode mode, uint32_t vcpuid,
> - uint64_t extra_mem_pages, void *guest_code,
> - unsigned long type)
> + uint64_t extra_mem_pages, void *guest_code)
>  {
>   struct kvm_vm *vm;
>   uint64_t extra_pg_pages = extra_mem_pages / 512 * 2;
>  
>   vm = _vm_create(mode, DEFAULT_GUEST_PHY_PAGES + extra_pg_pages,
> - O_RDWR, type);
> + O_RDWR);

nit: after removing type can O_RDWR go up a line?

>   kvm_vm_elf_load(vm, program_invocation_name, 0, 0);
>  #ifdef __x86_64__
>   vm_create_irqchip(vm);
> @@ -273,7 +272,6 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   struct kvm_vm *vm;
>   uint64_t max_gfn;
>   unsigned long *bmap;
> - unsigned long type = 0;
>  
>   switch (mode) {
>   case VM_MODE_P52V48_4K:
> @@ -314,10 +312,6 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>* bits we can change to 39.
>*/
>   guest_pa_bits = 39;
> -#endif
> -#ifdef __aarch64__
> - if (guest_pa_bits != 40)
> - type = KVM_VM_TYPE_ARM_IPA_SIZE(guest_pa_bits);
>  #endif
>   max_gfn = (1ul << (guest_pa_bits - guest_page_shift)) - 1;
>   guest_page_size = (1ul << guest_page_shift);
> @@ -351,7 +345,7 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   bmap = bitmap_alloc(host_num_pages);
>   host_bmap_track = bitmap_alloc(host_num_pages);
>  
> - vm = create_vm(mode, VCPU_ID, guest_num_pages, guest_code, type);
> + vm = create_vm(mode, VCPU_ID, guest_num_pages, guest_code);
>  
>  #ifdef USE_CLEAR_DIRTY_LOG
>   struct kvm_enable_cap cap = {};
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index 5463b7896a0a..cfc079f20815 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -61,7 +61,7 @@ int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap 
> *cap);
>  
>  struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int 
> perm);
>  struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages,
> -   int perm, unsigned long type);
> +   int perm);

nit: can perm go up?

>  void kvm_vm_free(struct kvm_vm *vmp);
>  void kvm_vm_restart(struct kvm_vm *vmp, int perm);
>  void kvm_vm_release(struct kvm_vm *vmp);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 6e49bb039376..0c7c4368bc14 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -84,7 +84,7 @@ int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap 
> *cap)
>   return ret;
>  }
>  
> -static void vm_open(struct kvm_vm *vm, int perm, unsigned long type)
> +static void vm_open(struct kvm_vm *vm, int perm)
>  {
>   vm->kvm_fd = open(KVM_DEV_PATH, perm);
>   if (vm->kvm_fd < 

Re: [PATCH] KVM: selftests: Detect max PA width from cpuid

2019-08-26 Thread Andrew Jones
On Mon, Aug 26, 2019 at 07:22:44PM +0800, Peter Xu wrote:
> On Mon, Aug 26, 2019 at 01:09:58PM +0200, Andrew Jones wrote:
> > On Mon, Aug 26, 2019 at 03:57:28PM +0800, Peter Xu wrote:
> > > The dirty_log_test is failing on some old machines like Xeon E3-1220
> > > with tripple faults when writting to the tracked memory region:
> > > 
> > >   Test iterations: 32, interval: 10 (ms)
> > >   Testing guest mode: PA-bits:52, VA-bits:48, 4K pages
> > >   guest physical test memory offset: 0x7fbffef000
> > >    Test Assertion Failure 
> > >   dirty_log_test.c:138: false
> > >   pid=6137 tid=6139 - Success
> > >  1  0x00401ca1: vcpu_worker at dirty_log_test.c:138
> > >  2  0x7f3dd9e392dd: ?? ??:0
> > >  3  0x7f3dd9b6a132: ?? ??:0
> > >   Invalid guest sync status: exit_reason=SHUTDOWN
> > > 
> > > It's because previously we moved the testing memory region from a
> > > static place (1G) to the top of the system's physical address space,
> > > meanwhile we stick to 39 bits PA for all the x86_64 machines.  That's
> > > not true for machines like Xeon E3-1220 where it only supports 36.
> > > 
> > > Let's unbreak this test by dynamically detect PA width from CPUID
> > > 0x8008.  Meanwhile, even allow kvm_get_supported_cpuid_index() to
> > > fail.  I don't know whether that could be useful because I think
> > > 0x8008 should be there for all x86_64 hosts, but I also think it's
> > > not really helpful to assert in the kvm_get_supported_cpuid_index().
> > > 
> > > Fixes: b442324b581556e
> > > CC: Paolo Bonzini 
> > > CC: Andrew Jones 
> > > CC: Radim Krčmář 
> > > CC: Thomas Huth 
> > > Signed-off-by: Peter Xu 
> > > ---
> > >  tools/testing/selftests/kvm/dirty_log_test.c  | 22 +--
> > >  .../selftests/kvm/lib/x86_64/processor.c  |  3 ---
> > >  2 files changed, 15 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> > > b/tools/testing/selftests/kvm/dirty_log_test.c
> > > index ceb52b952637..111592f3a1d7 100644
> > > --- a/tools/testing/selftests/kvm/dirty_log_test.c
> > > +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> > > @@ -274,18 +274,26 @@ static void run_test(enum vm_guest_mode mode, 
> > > unsigned long iterations,
> > >   DEBUG("Testing guest mode: %s\n", vm_guest_mode_string(mode));
> > >  
> > >  #ifdef __x86_64__
> > > - /*
> > > -  * FIXME
> > > -  * The x86_64 kvm selftests framework currently only supports a
> > > -  * single PML4 which restricts the number of physical address
> > > -  * bits we can change to 39.
> > > -  */
> > > - guest_pa_bits = 39;
> > > + {
> > > + struct kvm_cpuid_entry2 *entry;
> > > +
> > > + entry = kvm_get_supported_cpuid_entry(0x8008);
> > > + /*
> > > +  * Supported PA width can be smaller than 52 even if
> > > +  * we're with VM_MODE_P52V48_4K mode.  Fetch it from
> > 
> > It seems like x86_64 should create modes that actually work, rather than
> > always using one named 'P52', but then needing to probe for the actual
> > number of supported physical bits. Indeed testing all x86_64 supported
> > modes, like aarch64 does, would even make more sense in this test.
> 
> Should be true.  I'll think it over again...
> 
> > 
> > 
> > > +  * the host to update the default value (SDM 4.1.4).
> > > +  */
> > > + if (entry)
> > > + guest_pa_bits = entry->eax & 0xff;
> > 
> > Are we sure > 39 bits will work with this test framework? I can't
> > recall what led me to the FIXME above, other than things not working.
> > It seems I was convinced we couldn't have more bits due to how pml4's
> > were allocated, but maybe I misinterpreted it.
> 
> As mentioned in the IRC - I think I've got a "success case" of
> that... :)  Please see below:
> 
> virtlab423:~ $ lscpu
> Architecture:x86_64
> CPU op-mode(s):  32-bit, 64-bit
> Byte Order:  Little Endian
> CPU(s):  16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  1
> Core(s) per socket:  8
> Socket(s):   2
> NUMA node(s):2
> Vendor ID:   GenuineIntel
> CPU family:  6
> Model:   63
> Model name:  Intel(R) Xeon(R) CPU E5-2640 

Re: [PATCH] KVM: selftests: Detect max PA width from cpuid

2019-08-26 Thread Andrew Jones
On Mon, Aug 26, 2019 at 03:57:28PM +0800, Peter Xu wrote:
> The dirty_log_test is failing on some old machines like Xeon E3-1220
> with tripple faults when writting to the tracked memory region:
> 
>   Test iterations: 32, interval: 10 (ms)
>   Testing guest mode: PA-bits:52, VA-bits:48, 4K pages
>   guest physical test memory offset: 0x7fbffef000
>    Test Assertion Failure 
>   dirty_log_test.c:138: false
>   pid=6137 tid=6139 - Success
>  1  0x00401ca1: vcpu_worker at dirty_log_test.c:138
>  2  0x7f3dd9e392dd: ?? ??:0
>  3  0x7f3dd9b6a132: ?? ??:0
>   Invalid guest sync status: exit_reason=SHUTDOWN
> 
> It's because previously we moved the testing memory region from a
> static place (1G) to the top of the system's physical address space,
> meanwhile we stick to 39 bits PA for all the x86_64 machines.  That's
> not true for machines like Xeon E3-1220 where it only supports 36.
> 
> Let's unbreak this test by dynamically detect PA width from CPUID
> 0x8008.  Meanwhile, even allow kvm_get_supported_cpuid_index() to
> fail.  I don't know whether that could be useful because I think
> 0x8008 should be there for all x86_64 hosts, but I also think it's
> not really helpful to assert in the kvm_get_supported_cpuid_index().
> 
> Fixes: b442324b581556e
> CC: Paolo Bonzini 
> CC: Andrew Jones 
> CC: Radim Krčmář 
> CC: Thomas Huth 
> Signed-off-by: Peter Xu 
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c  | 22 +--
>  .../selftests/kvm/lib/x86_64/processor.c  |  3 ---
>  2 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index ceb52b952637..111592f3a1d7 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -274,18 +274,26 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   DEBUG("Testing guest mode: %s\n", vm_guest_mode_string(mode));
>  
>  #ifdef __x86_64__
> - /*
> -  * FIXME
> -  * The x86_64 kvm selftests framework currently only supports a
> -  * single PML4 which restricts the number of physical address
> -  * bits we can change to 39.
> -  */
> - guest_pa_bits = 39;
> + {
> + struct kvm_cpuid_entry2 *entry;
> +
> + entry = kvm_get_supported_cpuid_entry(0x8008);
> + /*
> +  * Supported PA width can be smaller than 52 even if
> +  * we're with VM_MODE_P52V48_4K mode.  Fetch it from

It seems like x86_64 should create modes that actually work, rather than
always using one named 'P52', but then needing to probe for the actual
number of supported physical bits. Indeed testing all x86_64 supported
modes, like aarch64 does, would even make more sense in this test.


> +  * the host to update the default value (SDM 4.1.4).
> +  */
> + if (entry)
> + guest_pa_bits = entry->eax & 0xff;

Are we sure > 39 bits will work with this test framework? I can't
recall what led me to the FIXME above, other than things not working.
It seems I was convinced we couldn't have more bits due to how pml4's
were allocated, but maybe I misinterpreted it.

> + else
> + guest_pa_bits = 32;
> + }
>  #endif
>  #ifdef __aarch64__
>   if (guest_pa_bits != 40)
>   type = KVM_VM_TYPE_ARM_IPA_SIZE(guest_pa_bits);
>  #endif
> + printf("Supported guest physical address width: %d\n", guest_pa_bits);
>   max_gfn = (1ul << (guest_pa_bits - guest_page_shift)) - 1;
>   guest_page_size = (1ul << guest_page_shift);
>   /*
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c 
> b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 6cb34a0fa200..9de2fd310ac8 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -760,9 +760,6 @@ kvm_get_supported_cpuid_index(uint32_t function, uint32_t 
> index)
>   break;
>   }
>   }
> -
> - TEST_ASSERT(entry, "Guest CPUID entry not found: (EAX=%x, ECX=%x).",
> - function, index);
>   return entry;
>  }
>  
> -- 
> 2.21.0
> 

Thanks,
drew


Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU

2019-08-22 Thread Andrew Jones
On Thu, Aug 22, 2019 at 02:10:48PM +0200, Alexander Graf wrote:
> On 22.08.19 10:44, Anup Patel wrote:
...
> > +static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > +   unsigned long fault_addr)
...
> > +   /* Exit to userspace for MMIO emulation */
> > +   vcpu->stat.mmio_exit_user++;
> > +   run->exit_reason = KVM_EXIT_MMIO;
> > +   run->mmio.is_write = false;
> > +   run->mmio.phys_addr = fault_addr;
> > +   run->mmio.len = len;
> > +
> > +   /* Move to next instruction */
> > +   vcpu->arch.guest_context.sepc += INSN_LEN(insn);
> 
> Doesn't that make more sense on the reentry path? What if you want to inject
> an MCE on access to unmapped addresses from user space?
>

I agree. See commit 0d640732dbeb for arm's justification for moving
the instruction skip. But also see

https://patchwork.kernel.org/patch/11109063/

for a needed fix to avoid skipping the instructions multiple times.
It looks like riscv's KVM_RUN ioctl would be vulnerable to that as
well.

Thanks,
drew


Re: [PATCH v2] selftests: kvm: Adding config fragments

2019-08-08 Thread Andrew Jones
On Thu, Aug 08, 2019 at 01:31:40PM +0100, Naresh Kamboju wrote:
> selftests kvm all test cases need pre-required kernel configs for the
> tests to get pass.
> 
> The KVM tests are skipped without these configs:
> 
> dev_fd = open(KVM_DEV_PATH, O_RDONLY);
> if (dev_fd < 0)
> exit(KSFT_SKIP);
> 
> Signed-off-by: Naresh Kamboju 
> ---
>  tools/testing/selftests/kvm/config | 3 +++
>  1 file changed, 3 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/config
> 
> diff --git a/tools/testing/selftests/kvm/config 
> b/tools/testing/selftests/kvm/config
> new file mode 100644
> index ..63ed533f73d6
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/config
> @@ -0,0 +1,3 @@
> +CONFIG_KVM=y
> +CONFIG_KVM_INTEL=y
> +CONFIG_KVM_AMD=y
> -- 
> 2.17.1
>

What does the kselftests config file do? I was about to complain that this
would break compiling on non-x86 platforms, but 'make kselftest' and other
forms of invoking the build work fine on aarch64 even with this config
file. So is this just for documentation? If so, then its still obviously
wrong for non-x86 platforms. The only config that makes sense here is KVM.
If the other options need to be documented for x86, then should they get
an additional config file? tools/testing/selftests/kvm/x86_64/config?

Thanks,
drew


Re: [PATCH v2 0/3] KVM: selftests: Enable ucall and dirty_log_test on s390x

2019-07-31 Thread Andrew Jones
On Wed, Jul 31, 2019 at 03:32:13PM +0200, Thomas Huth wrote:
> Implement the ucall() interface on s390x to be able to use the
> dirty_log_test KVM selftest on s390x, too.
> 
> v2:
>  - Split up ucall.c into architecture specific files
>  - Removed some #ifdef __s390x__  in the dirty_log patch
> 
> Thomas Huth (3):
>   KVM: selftests: Split ucall.c into architecture specific files
>   KVM: selftests: Implement ucall() for s390x
>   KVM: selftests: Enable dirty_log_test on s390x
> 
>  tools/testing/selftests/kvm/Makefile  |   9 +-
>  tools/testing/selftests/kvm/dirty_log_test.c  |  61 ++-
>  .../testing/selftests/kvm/include/kvm_util.h  |   8 +-
>  .../testing/selftests/kvm/lib/aarch64/ucall.c | 112 +
>  tools/testing/selftests/kvm/lib/s390x/ucall.c |  56 +++
>  tools/testing/selftests/kvm/lib/ucall.c   | 157 --
>  .../testing/selftests/kvm/lib/x86_64/ucall.c  |  56 +++
>  .../selftests/kvm/s390x/sync_regs_test.c  |   6 +-
>  8 files changed, 287 insertions(+), 178 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/lib/aarch64/ucall.c
>  create mode 100644 tools/testing/selftests/kvm/lib/s390x/ucall.c
>  delete mode 100644 tools/testing/selftests/kvm/lib/ucall.c
>  create mode 100644 tools/testing/selftests/kvm/lib/x86_64/ucall.c
> 
> -- 
> 2.21.0
>

With the include change to fix compilation on aarch64, for the series

Reviewed-by: Andrew Jones 


Re: [PATCH v2 1/3] KVM: selftests: Split ucall.c into architecture specific files

2019-07-31 Thread Andrew Jones
On Wed, Jul 31, 2019 at 03:32:14PM +0200, Thomas Huth wrote:
> The way we exit from a guest to userspace is very specific to the
> architecture: On x86, we use PIO, on aarch64 we are using MMIO and on
> s390x we're going to use an instruction instead. The possibility to
> select a type via the ucall_type_t enum is currently also completely
> unused, so the code in ucall.c currently looks more complex than
> required. Let's split this up into architecture specific ucall.c
> files instead, so we can get rid of the #ifdefs and the unnecessary
> ucall_type_t handling.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tools/testing/selftests/kvm/Makefile  |   6 +-
>  tools/testing/selftests/kvm/dirty_log_test.c  |   2 +-
>  .../testing/selftests/kvm/include/kvm_util.h  |   8 +-
>  .../testing/selftests/kvm/lib/aarch64/ucall.c | 112 +
>  tools/testing/selftests/kvm/lib/ucall.c   | 157 --
>  .../testing/selftests/kvm/lib/x86_64/ucall.c  |  56 +++
>  6 files changed, 173 insertions(+), 168 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/lib/aarch64/ucall.c
>  delete mode 100644 tools/testing/selftests/kvm/lib/ucall.c
>  create mode 100644 tools/testing/selftests/kvm/lib/x86_64/ucall.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index ba7849751989..a51e3b83df40 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -7,9 +7,9 @@ top_srcdir = ../../../..
>  KSFT_KHDR_INSTALL := 1
>  UNAME_M := $(shell uname -m)
>  
> -LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/ucall.c 
> lib/sparsebit.c
> -LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c
> -LIBKVM_aarch64 = lib/aarch64/processor.c
> +LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c
> +LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/ucall.c
> +LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
>  LIBKVM_s390x = lib/s390x/processor.c
>  
>  TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index ceb52b952637..5d5ae1be4984 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -337,7 +337,7 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
>  #endif
>  #ifdef __aarch64__
> - ucall_init(vm, UCALL_MMIO, NULL);
> + ucall_init(vm, NULL);
>  #endif
>  
>   /* Export the shared variables to the guest */
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index e0e66b115ef2..5463b7896a0a 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -165,12 +165,6 @@ int vm_create_device(struct kvm_vm *vm, struct 
> kvm_create_device *cd);
>   memcpy(&(g), _p, sizeof(g));\
>  })
>  
> -/* ucall implementation types */
> -typedef enum {
> - UCALL_PIO,
> - UCALL_MMIO,
> -} ucall_type_t;
> -
>  /* Common ucalls */
>  enum {
>   UCALL_NONE,
> @@ -186,7 +180,7 @@ struct ucall {
>   uint64_t args[UCALL_MAX_ARGS];
>  };
>  
> -void ucall_init(struct kvm_vm *vm, ucall_type_t type, void *arg);
> +void ucall_init(struct kvm_vm *vm, void *arg);
>  void ucall_uninit(struct kvm_vm *vm);
>  void ucall(uint64_t cmd, int nargs, ...);
>  uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
> diff --git a/tools/testing/selftests/kvm/lib/aarch64/ucall.c 
> b/tools/testing/selftests/kvm/lib/aarch64/ucall.c
> new file mode 100644
> index ..f69f951a48c0
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/lib/aarch64/ucall.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ucall support. A ucall is a "hypercall to userspace".
> + *
> + * Copyright (C) 2018, Red Hat, Inc.
> + */
> +#include "kvm_util.h"
> +#include "kvm_util_internal.h"

This needs to be #include "../kvm_util_internal.h"
otherwise we get

lib/aarch64/ucall.c:8:10: fatal error: kvm_util_internal.h: No such file or 
directory
 #include "kvm_util_internal.h"

With that change compilation completes and the tests run.

Thanks,
drew

> +
> +static vm_vaddr_t *ucall_exit_mmio_addr;
> +
> +static bool ucall_mmio_init(struct kvm_vm *vm, vm_paddr_t gpa)
> +{
> + if (kvm_userspace_memory_region_find(vm, gpa, gpa + 1))
> + return false;
> +
> + virt_pg_map(vm, gpa, gpa, 0);
> +
> + ucall_exit_mmio_addr = (vm_vaddr_t *)gpa;
> + sync_global_to_guest(vm, ucall_exit_mmio_addr);
> +
> + return true;
> +}
> +
> +void ucall_init(struct kvm_vm *vm, void *arg)
> +{
> + vm_paddr_t gpa, start, end, step, offset;
> + unsigned int bits;
> + bool ret;
> +
> + if (arg) {
> + gpa 

Re: [PATCH 1/2] KVM: selftests: Implement ucall() for s390x

2019-07-31 Thread Andrew Jones
On Wed, Jul 31, 2019 at 02:57:38PM +0200, Paolo Bonzini wrote:
> On 31/07/19 13:16, Thomas Huth wrote:
> > Or maybe even better: Let's move this file into lib/x86_64/ and
> > lib/aarch64/ instead, since there is more different code between the
> > architectures here than common code.
> 
> All good solutions, just choose one. :))
>

Agreed, and I like this last solution (move to arch-code) the best.

Thanks,
drew


Re: [PATCH 1/2] KVM: selftests: Implement ucall() for s390x

2019-07-31 Thread Andrew Jones
On Wed, Jul 31, 2019 at 11:43:16AM +0200, Thomas Huth wrote:
> On 30/07/2019 12.48, Andrew Jones wrote:
> > On Tue, Jul 30, 2019 at 12:01:11PM +0200, Thomas Huth wrote:
> >> On s390x, we can neither exit via PIO nor MMIO, but have to use
> >> an instruction like DIAGNOSE. While we're at it, rename UCALL_PIO
> >> to UCALL_DEFAULT, since PIO only works on x86 anyway, and this
> >> way we can re-use the "default" type for the DIAGNOSE exit on s390x.
> >>
> >> Now that ucall() is implemented, we can use it in the sync_reg_test
> >> on s390x, too.
> >>
> >> Signed-off-by: Thomas Huth 
> >> ---
> >>  .../testing/selftests/kvm/include/kvm_util.h  |  2 +-
> >>  tools/testing/selftests/kvm/lib/ucall.c   | 34 +++
> >>  .../selftests/kvm/s390x/sync_regs_test.c  |  6 ++--
> >>  3 files changed, 32 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> >> b/tools/testing/selftests/kvm/include/kvm_util.h
> >> index e0e66b115ef2..c37aea2e33e5 100644
> >> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> >> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> >> @@ -167,7 +167,7 @@ int vm_create_device(struct kvm_vm *vm, struct 
> >> kvm_create_device *cd);
> >>  
> >>  /* ucall implementation types */
> >>  typedef enum {
> >> -  UCALL_PIO,
> >> +  UCALL_DEFAULT,
> > 
> > I'd rather we keep explicit types defined; keep PIO and add DIAG. Then
> > we can have
> > 
> > /*  Set default ucall types */
> > #if defined(__x86_64__)
> >   ucall_type = UCALL_PIO;
> > #elif defined(__aarch64__)
> >   ucall_type = UCALL_MMIO;
> >   ucall_requires_init = true;
> > #elif defined(__s390x__)
> >   ucall_type = UCALL_DIAG;
> > #endif
> > 
> > And add an assert in get_ucall()
> > 
> >  assert(!ucall_requires_init || ucall_initialized);
> 
> I'm not sure whether I really like that. It's yet another additional
> #ifdef block, and yet another variable ...
> 
> What do you think about removing the enum completely and simply code it
> directly, without the ucall_type indirection, i.e.:
> 
> void ucall(uint64_t cmd, int nargs, ...)
> {
>   struct ucall uc = {
>   .cmd = cmd,
>   };
>   va_list va;
>   int i;
> 
>   nargs = nargs <= UCALL_MAX_ARGS ? nargs : UCALL_MAX_ARGS;
> 
>   va_start(va, nargs);
>   for (i = 0; i < nargs; ++i)
>   uc.args[i] = va_arg(va, uint64_t);
>   va_end(va);
> 
> #if defined(__x86_64__)
> 
>   /* Exit via PIO */
>   asm volatile("in %[port], %%al"
>   : : [port] "d" (UCALL_PIO_PORT), "D" () : "rax");
> 
> #elif defined(__aarch64__)
> 
>   *ucall_exit_mmio_addr = (vm_vaddr_t)
> 
> #elif defined(__s390x__)
> 
>   /* Exit via DIAGNOSE 0x501 (normally used for breakpoints) */
>   asm volatile ("diag 0,%0,0x501" : : "a"() : "memory");
> 
> #endif
> }
> 
> I think that's way less confusing than having to understand the meaning
> of ucall_type etc. before...?
>

Sounds good to me.

Thanks,
drew 


Re: [PATCH 2/2] KVM: selftests: Enable dirty_log_test on s390x

2019-07-31 Thread Andrew Jones
On Wed, Jul 31, 2019 at 10:19:57AM +0200, Thomas Huth wrote:
> On 30/07/2019 12.57, Andrew Jones wrote:
> > On Tue, Jul 30, 2019 at 12:01:12PM +0200, Thomas Huth wrote:
> >> To run the dirty_log_test on s390x, we have to make sure that we
> >> access the dirty log bitmap with little endian byte ordering and
> >> we have to properly align the memslot of the guest.
> >> Also all dirty bits of a segment are set once on s390x when one
> >> of the pages of a segment are written to for the first time, so
> >> we have to make sure that we touch all pages during the first
> >> iteration to keep the test in sync here.
> >>
> >> Signed-off-by: Thomas Huth 
> >> ---
> [...]
> >> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> >> b/tools/testing/selftests/kvm/dirty_log_test.c
> >> index ceb52b952637..7a1223ad0ff3 100644
> >> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> >> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> >> @@ -26,9 +26,22 @@
> >>  /* The memory slot index to track dirty pages */
> >>  #define TEST_MEM_SLOT_INDEX   1
> >>  
> >> +#ifdef __s390x__
> >> +
> >> +/*
> >> + * On s390x, the ELF program is sometimes linked at 0x8000, so we can
> >> + * not use 0x4000 here without overlapping into that region. Thus 
> >> let's
> >> + * use 0xc000 as base address there instead.
> >> + */
> >> +#define DEFAULT_GUEST_TEST_MEM0xc000
> > 
> > I think both x86 and aarch64 should be ok with this offset. If testing
> > proves it does, then we can just change it for all architecture.
> 
> Ok. It seems to work on x86 - could you please check aarch64, since I
> don't have such a system available right now?

Tested it. It works on aarch64 too.

> 
> >> +/* Dirty bitmaps are always little endian, so we need to swap on big 
> >> endian */
> >> +#if defined(__s390x__)
> >> +# define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7)
> >> +# define test_bit_le(nr, addr) \
> >> +  test_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> >> +# define set_bit_le(nr, addr) \
> >> +  set_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> >> +# define clear_bit_le(nr, addr) \
> >> +  clear_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> >> +# define test_and_set_bit_le(nr, addr) \
> >> +  test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> >> +# define test_and_clear_bit_le(nr, addr) \
> >> +  test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> >> +#else
> >> +# define test_bit_le  test_bit
> >> +# define set_bit_le   set_bit
> >> +# define clear_bit_le clear_bit
> >> +# define test_and_set_bit_le  test_and_set_bit
> >> +# define test_and_clear_bit_letest_and_clear_bit
> >> +#endif
> > 
> > nit: does the formatting above look right after applying the patch?
> 
> It looked ok to me, but I can add some more tabs to even make it nicer :)
> 
> >> @@ -293,6 +341,10 @@ static void run_test(enum vm_guest_mode mode, 
> >> unsigned long iterations,
> >> * case where the size is not aligned to 64 pages.
> >> */
> >>guest_num_pages = (1ul << (30 - guest_page_shift)) + 16;
> >> +#ifdef __s390x__
> >> +  /* Round up to multiple of 1M (segment size) */
> >> +  guest_num_pages = (guest_num_pages + 0xff) & ~0xffUL;
> > 
> > We could maybe do this for all architectures as well.
> 
> It's really only needed on s390x, so I think we should keep the #ifdef here.
>

OK

Thanks,
drew 


Re: [PATCH 2/2] KVM: selftests: Enable dirty_log_test on s390x

2019-07-30 Thread Andrew Jones
On Tue, Jul 30, 2019 at 12:01:12PM +0200, Thomas Huth wrote:
> To run the dirty_log_test on s390x, we have to make sure that we
> access the dirty log bitmap with little endian byte ordering and
> we have to properly align the memslot of the guest.
> Also all dirty bits of a segment are set once on s390x when one
> of the pages of a segment are written to for the first time, so
> we have to make sure that we touch all pages during the first
> iteration to keep the test in sync here.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tools/testing/selftests/kvm/Makefile |  1 +
>  tools/testing/selftests/kvm/dirty_log_test.c | 70 ++--
>  2 files changed, 66 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index ba7849751989..ac7e63e00fee 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -33,6 +33,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_test
>  TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
>  
>  TEST_GEN_PROGS_s390x += s390x/sync_regs_test
> +TEST_GEN_PROGS_s390x += dirty_log_test
>  TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
>  
>  TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index ceb52b952637..7a1223ad0ff3 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -26,9 +26,22 @@
>  /* The memory slot index to track dirty pages */
>  #define TEST_MEM_SLOT_INDEX  1
>  
> +#ifdef __s390x__
> +
> +/*
> + * On s390x, the ELF program is sometimes linked at 0x8000, so we can
> + * not use 0x4000 here without overlapping into that region. Thus let's
> + * use 0xc000 as base address there instead.
> + */
> +#define DEFAULT_GUEST_TEST_MEM   0xc000

I think both x86 and aarch64 should be ok with this offset. If testing
proves it does, then we can just change it for all architecture.

> +
> +#else
> +
>  /* Default guest test memory offset, 1G */
>  #define DEFAULT_GUEST_TEST_MEM   0x4000
>  
> +#endif
> +
>  /* How many pages to dirty for each guest loop */
>  #define TEST_PAGES_PER_LOOP  1024
>  
> @@ -38,6 +51,27 @@
>  /* Interval for each host loop (ms) */
>  #define TEST_HOST_LOOP_INTERVAL  10UL
>  
> +/* Dirty bitmaps are always little endian, so we need to swap on big endian 
> */
> +#if defined(__s390x__)
> +# define BITOP_LE_SWIZZLE((BITS_PER_LONG-1) & ~0x7)
> +# define test_bit_le(nr, addr) \
> + test_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> +# define set_bit_le(nr, addr) \
> + set_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> +# define clear_bit_le(nr, addr) \
> + clear_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> +# define test_and_set_bit_le(nr, addr) \
> + test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> +# define test_and_clear_bit_le(nr, addr) \
> + test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, addr)
> +#else
> +# define test_bit_le test_bit
> +# define set_bit_le  set_bit
> +# define clear_bit_leclear_bit
> +# define test_and_set_bit_le test_and_set_bit
> +# define test_and_clear_bit_le   test_and_clear_bit
> +#endif

nit: does the formatting above look right after applying the patch?

> +
>  /*
>   * Guest/Host shared variables. Ensure addr_gva2hva() and/or
>   * sync_global_to/from_guest() are used when accessing from
> @@ -69,11 +103,25 @@ static uint64_t guest_test_virt_mem = 
> DEFAULT_GUEST_TEST_MEM;
>   */
>  static void guest_code(void)
>  {
> + uint64_t addr;
>   int i;
>  
> +#ifdef __s390x__
> + /*
> +  * On s390x, all pages of a 1M segment are initially marked as dirty
> +  * when a page of the segment is written to for the very first time.
> +  * To compensate this specialty in this test, we need to touch all
> +  * pages during the first iteration.
> +  */
> + for (i = 0; i < guest_num_pages; i++) {
> + addr = guest_test_virt_mem + i * guest_page_size;
> + *(uint64_t *)addr = READ_ONCE(iteration);
> + }
> +#endif
> +
>   while (true) {
>   for (i = 0; i < TEST_PAGES_PER_LOOP; i++) {
> - uint64_t addr = guest_test_virt_mem;
> + addr = guest_test_virt_mem;
>   addr += (READ_ONCE(random_array[i]) % guest_num_pages)
>   * guest_page_size;
>   addr &= ~(host_page_size - 1);
> @@ -158,15 +206,15 @@ static void vm_dirty_log_verify(unsigned long *bmap)
>   value_ptr = host_test_mem + page * host_page_size;
>  
>   /* If this is a special page that we were tracking... */
> - if (test_and_clear_bit(page, host_bmap_track)) {
> + if (test_and_clear_bit_le(page, host_bmap_track)) {
>   host_track_next_count++;
> - 

Re: [PATCH 1/2] KVM: selftests: Implement ucall() for s390x

2019-07-30 Thread Andrew Jones
On Tue, Jul 30, 2019 at 12:01:11PM +0200, Thomas Huth wrote:
> On s390x, we can neither exit via PIO nor MMIO, but have to use
> an instruction like DIAGNOSE. While we're at it, rename UCALL_PIO
> to UCALL_DEFAULT, since PIO only works on x86 anyway, and this
> way we can re-use the "default" type for the DIAGNOSE exit on s390x.
> 
> Now that ucall() is implemented, we can use it in the sync_reg_test
> on s390x, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  .../testing/selftests/kvm/include/kvm_util.h  |  2 +-
>  tools/testing/selftests/kvm/lib/ucall.c   | 34 +++
>  .../selftests/kvm/s390x/sync_regs_test.c  |  6 ++--
>  3 files changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index e0e66b115ef2..c37aea2e33e5 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -167,7 +167,7 @@ int vm_create_device(struct kvm_vm *vm, struct 
> kvm_create_device *cd);
>  
>  /* ucall implementation types */
>  typedef enum {
> - UCALL_PIO,
> + UCALL_DEFAULT,

I'd rather we keep explicit types defined; keep PIO and add DIAG. Then
we can have

/*  Set default ucall types */
#if defined(__x86_64__)
  ucall_type = UCALL_PIO;
#elif defined(__aarch64__)
  ucall_type = UCALL_MMIO;
  ucall_requires_init = true;
#elif defined(__s390x__)
  ucall_type = UCALL_DIAG;
#endif

And add an assert in get_ucall()

 assert(!ucall_requires_init || ucall_initialized);


>   UCALL_MMIO,
>  } ucall_type_t;
>  
> diff --git a/tools/testing/selftests/kvm/lib/ucall.c 
> b/tools/testing/selftests/kvm/lib/ucall.c
> index dd9a66700f96..55534dd014dc 100644
> --- a/tools/testing/selftests/kvm/lib/ucall.c
> +++ b/tools/testing/selftests/kvm/lib/ucall.c
> @@ -30,7 +30,7 @@ void ucall_init(struct kvm_vm *vm, ucall_type_t type, void 
> *arg)
>   ucall_type = type;
>   sync_global_to_guest(vm, ucall_type);
>  
> - if (type == UCALL_PIO)
> + if (type == UCALL_DEFAULT)
>   return;
>  
>   if (type == UCALL_MMIO) {
> @@ -84,11 +84,18 @@ void ucall_uninit(struct kvm_vm *vm)
>   sync_global_to_guest(vm, ucall_exit_mmio_addr);
>  }
>  
> -static void ucall_pio_exit(struct ucall *uc)
> +static void ucall_default_exit(struct ucall *uc)
>  {
> -#ifdef __x86_64__
> +#if defined(__x86_64__)
> + /* Exit via PIO */
>   asm volatile("in %[port], %%al"
>   : : [port] "d" (UCALL_PIO_PORT), "D" (uc) : "rax");
> +#elif defined(__s390x__)
> + /* Exit via DIAGNOSE 0x501 (normally used for breakpoints) */
> + asm volatile ("diag 0,%0,0x501" : : "a"(uc) : "memory");
> +#else
> + fprintf(stderr, "No default ucall available on this architecture.\n");
> + exit(1);
>  #endif
>  }
>  
> @@ -113,8 +120,8 @@ void ucall(uint64_t cmd, int nargs, ...)
>   va_end(va);
>  
>   switch (ucall_type) {
> - case UCALL_PIO:
> - ucall_pio_exit();
> + case UCALL_DEFAULT:
> + ucall_default_exit();
>   break;
>   case UCALL_MMIO:
>   ucall_mmio_exit();
> @@ -128,15 +135,28 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, 
> struct ucall *uc)
>   struct ucall ucall = {};
>   bool got_ucall = false;
>  
> -#ifdef __x86_64__
> - if (ucall_type == UCALL_PIO && run->exit_reason == KVM_EXIT_IO &&
> +#if defined(__x86_64__)
> + if (ucall_type == UCALL_DEFAULT && run->exit_reason == KVM_EXIT_IO &&
>   run->io.port == UCALL_PIO_PORT) {
>   struct kvm_regs regs;
>   vcpu_regs_get(vm, vcpu_id, );
>   memcpy(, addr_gva2hva(vm, (vm_vaddr_t)regs.rdi), 
> sizeof(ucall));
>   got_ucall = true;
>   }
> +#elif defined(__s390x__)
> + if (ucall_type == UCALL_DEFAULT &&
> + run->exit_reason == KVM_EXIT_S390_SIEIC &&
> + run->s390_sieic.icptcode == 4 &&
> + (run->s390_sieic.ipa >> 8) == 0x83 &&/* 0x83 means DIAGNOSE */
> + (run->s390_sieic.ipb >> 16) == 0x501) {
> + int reg = run->s390_sieic.ipa & 0xf;
> +
> + memcpy(, addr_gva2hva(vm, run->s.regs.gprs[reg]),
> +sizeof(ucall));
> + got_ucall = true;
> + }
>  #endif
> +
>   if (ucall_type == UCALL_MMIO && run->exit_reason == KVM_EXIT_MMIO &&
>   run->mmio.phys_addr == (uint64_t)ucall_exit_mmio_addr) {
>   vm_vaddr_t gva;
> diff --git a/tools/testing/selftests/kvm/s390x/sync_regs_test.c 
> b/tools/testing/selftests/kvm/s390x/sync_regs_test.c
> index e85ff0d69548..bbc93094519b 100644
> --- a/tools/testing/selftests/kvm/s390x/sync_regs_test.c
> +++ b/tools/testing/selftests/kvm/s390x/sync_regs_test.c
> @@ -25,9 +25,11 @@
>  
>  static void guest_code(void)
>  {
> + register u64 stage asm("11") = 0;
> +
>   for (;;) {
> - asm volatile ("diag 0,0,0x501");
> - asm volatile ("ahi 11,1");
> +   

Re: [PATCH 2/9] KVM: selftests: Guard struct kvm_vcpu_events with __KVM_HAVE_VCPU_EVENTS

2019-05-23 Thread Andrew Jones
On Thu, May 23, 2019 at 06:43:02PM +0200, Thomas Huth wrote:
> The struct kvm_vcpu_events code is only available on certain architectures
> (arm, arm64 and x86). To be able to compile kvm_util.c also for other
> architectures, we have to fence the code with __KVM_HAVE_VCPU_EVENTS.
> 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Thomas Huth 
> ---
>  tools/testing/selftests/kvm/include/kvm_util.h | 2 ++
>  tools/testing/selftests/kvm/lib/kvm_util.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index a5a4b28f14d8..b8bf961074fe 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -114,10 +114,12 @@ void vcpu_sregs_set(struct kvm_vm *vm, uint32_t vcpuid,
>   struct kvm_sregs *sregs);
>  int _vcpu_sregs_set(struct kvm_vm *vm, uint32_t vcpuid,
>   struct kvm_sregs *sregs);
> +#ifdef __KVM_HAVE_VCPU_EVENTS
>  void vcpu_events_get(struct kvm_vm *vm, uint32_t vcpuid,
>struct kvm_vcpu_events *events);
>  void vcpu_events_set(struct kvm_vm *vm, uint32_t vcpuid,
>struct kvm_vcpu_events *events);
> +#endif
>  #ifdef __x86_64__
>  void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid,
>  struct kvm_nested_state *state);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index ba1359ac504f..08edb8436c47 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1224,6 +1224,7 @@ void vcpu_regs_set(struct kvm_vm *vm, uint32_t vcpuid, 
> struct kvm_regs *regs)
>   ret, errno);
>  }
>  
> +#ifdef __KVM_HAVE_VCPU_EVENTS
>  void vcpu_events_get(struct kvm_vm *vm, uint32_t vcpuid,
>struct kvm_vcpu_events *events)
>  {
> @@ -1249,6 +1250,7 @@ void vcpu_events_set(struct kvm_vm *vm, uint32_t vcpuid,
>   TEST_ASSERT(ret == 0, "KVM_SET_VCPU_EVENTS, failed, rc: %i errno: %i",
>   ret, errno);
>  }
> +#endif
>  
>  #ifdef __x86_64__
>  void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid,
> -- 
> 2.21.0
>

Reviewed-by: Andrew Jones 


Re: [PATCH 9/9] KVM: selftests: Move kvm_create_max_vcpus test to generic code

2019-05-23 Thread Andrew Jones
On Thu, May 23, 2019 at 06:43:09PM +0200, Thomas Huth wrote:
> There is nothing x86-specific in the test apart from the VM_MODE_P52V48_4K
> which we can now replace with VM_MODE_DEFAULT. Thus let's move the file to
> the main folder and enable it for aarch64 and s390x, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tools/testing/selftests/kvm/Makefile  | 4 +++-
>  .../testing/selftests/kvm/{x86_64 => }/kvm_create_max_vcpus.c | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>  rename tools/testing/selftests/kvm/{x86_64 => }/kvm_create_max_vcpus.c (93%)
> 
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index d8beb990c8f4..aef5bd1166cf 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -21,15 +21,17 @@ TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
>  TEST_GEN_PROGS_x86_64 += x86_64/smm_test
> -TEST_GEN_PROGS_x86_64 += x86_64/kvm_create_max_vcpus
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
> +TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
>  TEST_GEN_PROGS_x86_64 += dirty_log_test
>  TEST_GEN_PROGS_x86_64 += clear_dirty_log_test
>  
>  TEST_GEN_PROGS_aarch64 += dirty_log_test
>  TEST_GEN_PROGS_aarch64 += clear_dirty_log_test
> +TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
>  
>  TEST_GEN_PROGS_s390x += s390x/sync_regs_test
> +TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
>  
>  TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
>  LIBKVM += $(LIBKVM_$(UNAME_M))
> diff --git a/tools/testing/selftests/kvm/x86_64/kvm_create_max_vcpus.c 
> b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
> similarity index 93%
> rename from tools/testing/selftests/kvm/x86_64/kvm_create_max_vcpus.c
> rename to tools/testing/selftests/kvm/kvm_create_max_vcpus.c
> index 50e92996f918..db78ce07c416 100644
> --- a/tools/testing/selftests/kvm/x86_64/kvm_create_max_vcpus.c
> +++ b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0-only
>  /*
>   * kvm_create_max_vcpus
>   *
> @@ -28,7 +29,7 @@ void test_vcpu_creation(int first_vcpu_id, int num_vcpus)
>   printf("Testing creating %d vCPUs, with IDs %d...%d.\n",
>  num_vcpus, first_vcpu_id, first_vcpu_id + num_vcpus - 1);
>  
> - vm = vm_create(VM_MODE_P52V48_4K, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> + vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
>  
>   for (i = 0; i < num_vcpus; i++) {
>   int vcpu_id = first_vcpu_id + i;
> -- 
> 2.21.0
>

Reviewed-by: Andrew Jones 


Re: [PATCH 8/9] KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID

2019-05-23 Thread Andrew Jones
On Thu, May 23, 2019 at 06:43:08PM +0200, Thomas Huth wrote:
> KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
> architectures. However, on s390x, the amount of usable CPUs is determined
> during runtime - it is depending on the features of the machine the code
> is running on. Since we are using the vcpu_id as an index into the SCA
> structures that are defined by the hardware (see e.g. the sca_add_vcpu()
> function), it is not only the amount of CPUs that is limited by the hard-
> ware, but also the range of IDs that we can use.
> Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
> So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
> code into the architecture specific code, and on s390x we have to return
> the same value here as for KVM_CAP_MAX_VCPUS.
> This problem has been discovered with the kvm_create_max_vcpus selftest.
> With this change applied, the selftest now passes on s390x, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/mips/kvm/mips.c   | 3 +++
>  arch/powerpc/kvm/powerpc.c | 3 +++
>  arch/s390/kvm/kvm-s390.c   | 1 +
>  arch/x86/kvm/x86.c | 3 +++
>  virt/kvm/arm/arm.c | 3 +++
>  virt/kvm/kvm_main.c| 2 --
>  6 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 6d0517ac18e5..0369f26ab96d 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -1122,6 +1122,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_MAX_VCPUS:
>   r = KVM_MAX_VCPUS;
>   break;
> + case KVM_CAP_MAX_VCPU_ID:
> + r = KVM_MAX_VCPU_ID;
> + break;
>   case KVM_CAP_MIPS_FPU:
>   /* We don't handle systems with inconsistent cpu_has_fpu */
>   r = !!raw_cpu_has_fpu;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 3393b166817a..aa3a678711be 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -657,6 +657,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_MAX_VCPUS:
>   r = KVM_MAX_VCPUS;
>   break;
> + case KVM_CAP_MAX_VCPU_ID:
> + r = KVM_MAX_VCPU_ID;
> + break;
>  #ifdef CONFIG_PPC_BOOK3S_64
>   case KVM_CAP_PPC_GET_SMMU_INFO:
>   r = 1;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 8d6d75db8de6..871d2e99b156 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -539,6 +539,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   break;
>   case KVM_CAP_NR_VCPUS:
>   case KVM_CAP_MAX_VCPUS:
> + case KVM_CAP_MAX_VCPU_ID:
>   r = KVM_S390_BSCA_CPU_SLOTS;
>   if (!kvm_s390_use_sca_entries())
>   r = KVM_MAX_VCPUS;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 536b78c4af6e..09a07d6a154e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3122,6 +3122,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_MAX_VCPUS:
>   r = KVM_MAX_VCPUS;
>   break;
> + case KVM_CAP_MAX_VCPU_ID:
> + r = KVM_MAX_VCPU_ID;
> + break;
>   case KVM_CAP_PV_MMU:/* obsolete */
>   r = 0;
>   break;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 90cedebaeb94..7eeebe5e9da2 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -224,6 +224,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_MAX_VCPUS:
>   r = KVM_MAX_VCPUS;
>   break;
> + case KVM_CAP_MAX_VCPU_ID:
> + r = KVM_MAX_VCPU_ID;
> + break;
>   case KVM_CAP_MSI_DEVID:
>   if (!kvm)
>   r = -EINVAL;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f0d13d9d125d..c09259dd6286 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3146,8 +3146,6 @@ static long kvm_vm_ioctl_check_extension_generic(struct 
> kvm *kvm, long arg)
>   case KVM_CAP_MULTI_ADDRESS_SPACE:
>   return KVM_ADDRESS_SPACE_NUM;
>  #endif
> - case KVM_CAP_MAX_VCPU_ID:
> - return KVM_MAX_VCPU_ID;
>   case KVM_CAP_NR_MEMSLOTS:
>   return KVM_USER_MEM_SLOTS;
>   default:
> -- 
> 2.21.0
>

Reviewed-by: Andrew Jones 


Re: [PATCH 5/9] KVM: selftests: Align memory region addresses to 1M on s390x

2019-05-23 Thread Andrew Jones
On Thu, May 23, 2019 at 06:43:05PM +0200, Thomas Huth wrote:
> On s390x, there is a constraint that memory regions have to be aligned
> to 1M (or running the VM will fail). Introduce a new "alignment" variable
> in the vm_userspace_mem_region_add() function which now can be used for
> both, huge page and s390x alignment requirements.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 21 -
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 08edb8436c47..656df9d5cd4d 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -559,6 +559,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>   unsigned long pmem_size = 0;
>   struct userspace_mem_region *region;
>   size_t huge_page_size = KVM_UTIL_PGS_PER_HUGEPG * vm->page_size;
> + size_t alignment;
>  
>   TEST_ASSERT((guest_paddr % vm->page_size) == 0, "Guest physical "
>   "address not on a page boundary.\n"
> @@ -608,9 +609,20 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>   TEST_ASSERT(region != NULL, "Insufficient Memory");
>   region->mmap_size = npages * vm->page_size;
>  
> - /* Enough memory to align up to a huge page. */
> +#ifdef __s390x__
> + /* On s390x, the host address must be aligned to 1M (due to PGSTEs) */
> + alignment = 0x10;
> +#else
> + alignment = 1;
> +#endif
> +
>   if (src_type == VM_MEM_SRC_ANONYMOUS_THP)
> - region->mmap_size += huge_page_size;
> + alignment = huge_page_size;

I guess s390x won't ever support VM_MEM_SRC_ANONYMOUS_THP? If it does,
then we need 'alignment = max(huge_page_size, alignment)'. Actually
that might be a nice way to write this anyway for future-proofing.

> +
> + /* Add enough memory to align up if necessary */
> + if (alignment > 1)
> + region->mmap_size += alignment;
> +
>   region->mmap_start = mmap(NULL, region->mmap_size,
> PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS
> @@ -620,9 +632,8 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>   "test_malloc failed, mmap_start: %p errno: %i",
>   region->mmap_start, errno);
>  
> - /* Align THP allocation up to start of a huge page. */
> - region->host_mem = align(region->mmap_start,
> -  src_type == VM_MEM_SRC_ANONYMOUS_THP ?  
> huge_page_size : 1);
> + /* Align host address */
> + region->host_mem = align(region->mmap_start, alignment);
>  
>   /* As needed perform madvise */
>   if (src_type == VM_MEM_SRC_ANONYMOUS || src_type == 
> VM_MEM_SRC_ANONYMOUS_THP) {
> -- 
> 2.21.0
> 


Re: [PATCH 4/9] KVM: selftests: Introduce a VM_MODE_DEFAULT macro for the default bits

2019-05-23 Thread Andrew Jones
On Thu, May 23, 2019 at 06:43:04PM +0200, Thomas Huth wrote:
> This will be required later for tests like the kvm_create_max_vcpus
> test that do not use the vm_create_default() function.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tools/testing/selftests/kvm/include/kvm_util.h  | 6 ++
>  tools/testing/selftests/kvm/lib/aarch64/processor.c | 2 +-
>  tools/testing/selftests/kvm/lib/x86_64/processor.c  | 2 +-
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index b8bf961074fe..b6eb6471e6b2 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -43,6 +43,12 @@ enum vm_guest_mode {
>   NUM_VM_MODES,
>  };
>  
> +#ifdef __aarch64__
> +#define VM_MODE_DEFAULT VM_MODE_P40V48_4K
> +#else
> +#define VM_MODE_DEFAULT VM_MODE_P52V48_4K
> +#endif
> +
>  #define vm_guest_mode_string(m) vm_guest_mode_string[m]
>  extern const char * const vm_guest_mode_string[];
>  
> diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c 
> b/tools/testing/selftests/kvm/lib/aarch64/processor.c
> index fa6cd340137c..596ccaf09cb6 100644
> --- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
> @@ -226,7 +226,7 @@ struct kvm_vm *vm_create_default(uint32_t vcpuid, 
> uint64_t extra_mem_pages,
>   uint64_t extra_pg_pages = (extra_mem_pages / ptrs_per_4k_pte) * 2;
>   struct kvm_vm *vm;
>  
> - vm = vm_create(VM_MODE_P40V48_4K, DEFAULT_GUEST_PHY_PAGES + 
> extra_pg_pages, O_RDWR);
> + vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES + 
> extra_pg_pages, O_RDWR);
>  
>   kvm_vm_elf_load(vm, program_invocation_name, 0, 0);
>   vm_vcpu_add_default(vm, vcpuid, guest_code);
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c 
> b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index dc7fae9fa424..bb38bbcefac5 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -823,7 +823,7 @@ struct kvm_vm *vm_create_default(uint32_t vcpuid, 
> uint64_t extra_mem_pages,
>   uint64_t extra_pg_pages = extra_mem_pages / 512 * 2;
>  
>   /* Create VM */
> - vm = vm_create(VM_MODE_P52V48_4K,
> + vm = vm_create(VM_MODE_DEFAULT,
>  DEFAULT_GUEST_PHY_PAGES + extra_pg_pages,
>  O_RDWR);
>  
> -- 
> 2.21.0
>

Reviewed-by: Andrew Jones 


Re: [RFC PATCH 4/4] KVM: selftests: Add the sync_regs test for s390x

2019-05-23 Thread Andrew Jones
On Thu, May 16, 2019 at 01:12:53PM +0200, Thomas Huth wrote:
> The test is an adaption of the same test for x86. Note that there
> are some differences in the way how s390x deals with the kvm_valid_regs
> in struct kvm_run, so some of the tests had to be removed. Also this
> test is not using the ucall() interface on s390x yet (which would need
> some work to be usable on s390x), so it simply drops out of the VM with
> a diag 0x501 breakpoint instead.
> 
> Signed-off-by: Thomas Huth 
> ---
>  MAINTAINERS   |   1 +
>  tools/testing/selftests/kvm/Makefile  |   2 +
>  .../selftests/kvm/s390x/sync_regs_test.c  | 151 ++
>  3 files changed, 154 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/s390x/sync_regs_test.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 514d1f88ee26..68f76ee9e821 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8645,6 +8645,7 @@ F:  arch/s390/include/asm/gmap.h
>  F:   arch/s390/include/asm/kvm*
>  F:   arch/s390/kvm/
>  F:   arch/s390/mm/gmap.c
> +F:   tools/testing/selftests/kvm/s390x/
>  F:   tools/testing/selftests/kvm/*/s390x/

Do we need these lines added? We have tools/testing/selftests/kvm/ in the
common KVM section already. If we do want to specify them specifically,
then I guess we need x86 and arm MAINTAINERS updates as well.

Thanks,
drew


Re: [PATCH] KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard

2019-05-23 Thread Andrew Jones
On Thu, May 23, 2019 at 11:31:14AM +0200, Thomas Huth wrote:
> struct kvm_nested_state is only available on x86 so far. To be able
> to compile the code on other architectures as well, we need to wrap
> the related code with #ifdefs.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tools/testing/selftests/kvm/include/kvm_util.h | 2 ++
>  tools/testing/selftests/kvm/lib/kvm_util.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index 8c6b9619797d..a5a4b28f14d8 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -118,10 +118,12 @@ void vcpu_events_get(struct kvm_vm *vm, uint32_t vcpuid,
>struct kvm_vcpu_events *events);
>  void vcpu_events_set(struct kvm_vm *vm, uint32_t vcpuid,
>struct kvm_vcpu_events *events);
> +#ifdef __x86_64__
>  void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid,
>  struct kvm_nested_state *state);
>  int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t vcpuid,
> struct kvm_nested_state *state, bool ignore_error);
> +#endif
>  
>  const char *exit_reason_str(unsigned int exit_reason);
>  
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index cf62de377310..633b22df46a4 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1248,6 +1248,7 @@ void vcpu_events_set(struct kvm_vm *vm, uint32_t vcpuid,
>   ret, errno);
>  }
>  
> +#ifdef __x86_64__
>  void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid,
>  struct kvm_nested_state *state)
>  {
> @@ -1279,6 +1280,7 @@ int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t 
> vcpuid,
>  
>   return ret;
>  }
> +#endif
>  
>  /*
>   * VM VCPU System Regs Get
> -- 
> 2.21.0
>

Reviewed-by: Andrew Jones 


Re: [PATCH v2] KVM: selftests: Compile code with warnings enabled

2019-05-23 Thread Andrew Jones
On Mon, May 20, 2019 at 12:03:08PM +0200, Paolo Bonzini wrote:
> On 17/05/19 11:04, Thomas Huth wrote:
> > So far the KVM selftests are compiled without any compiler warnings
> > enabled. That's quite bad, since we miss a lot of possible bugs this
> > way. Let's enable at least "-Wall" and some other useful warning flags
> > now, and fix at least the trivial problems in the code (like unused
> > variables).
> > 
> > Signed-off-by: Thomas Huth 
> > ---
> >  v2:
> >  - Rebased to kvm/queue
> >  - Fix warnings in state_test.c and evmcs_test.c, too
> > 
> >  tools/testing/selftests/kvm/Makefile   | 4 +++-
> >  tools/testing/selftests/kvm/dirty_log_test.c   | 6 +-
> >  tools/testing/selftests/kvm/lib/kvm_util.c | 3 ---
> >  tools/testing/selftests/kvm/lib/x86_64/processor.c | 4 +---
> >  tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c   | 1 +
> >  tools/testing/selftests/kvm/x86_64/evmcs_test.c| 7 +--
> >  tools/testing/selftests/kvm/x86_64/platform_info_test.c| 1 -
> >  tools/testing/selftests/kvm/x86_64/smm_test.c  | 3 +--
> >  tools/testing/selftests/kvm/x86_64/state_test.c| 7 +--
> >  .../selftests/kvm/x86_64/vmx_close_while_nested_test.c | 5 +
> >  tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c   | 5 ++---
> >  11 files changed, 16 insertions(+), 30 deletions(-)
> 
> Queued, with a squashed fix to kvm_get_supported_hv_cpuid.
>

I've done the fixups needed to keep aarch64 compiling and will send
the patch shortly.

drew 


Re: [RFC PATCH 0/4] KVM selftests for s390x

2019-05-22 Thread Andrew Jones
On Mon, May 20, 2019 at 01:43:06PM +0200, Paolo Bonzini wrote:
> On 20/05/19 13:30, Thomas Huth wrote:
> >> No objections at all, though it would be like to have ucall plumbed in
> >> from the beginning.
> > I'm still looking at the ucall interface ... what I don't quite get yet
> > is the question why the ucall_type there is selectable during runtime?
> > 
> > Are there plans to have test that could either use UCALL_PIO or
> > UCALL_MMIO? If not, what about moving ucall_init() and ucall() to
> > architecture specific code in tools/testing/selftests/kvm/lib/aarch64/
> > and tools/testing/selftests/kvm/lib/x86_64 instead, and to remove the
> > ucall_type stuff again (so that x86 is hard-wired to PIO and aarch64
> > is hard-wired to MMIO)? ... then I could add a DIAG-based ucall
> > on s390x more easily, I think.
> 
> Yes, that would work.  I think Andrew wanted the flexibility to use MMIO
> on x86, but it's not really necessary to have it.

If the flexibility isn't necessary, then I agree that it'll be nicer to
put the ucall_init() in arch setup code, avoiding the need to remember
it in each unit test.

Thanks,
drew


Re: [PATCH 1/3] selftests: kvm/evmcs_test: complete I/O before migrating guest state

2019-04-16 Thread Andrew Jones
On Thu, Apr 11, 2019 at 06:48:26PM +0200, Paolo Bonzini wrote:
> Starting state migration after an IO exit without first completing IO
> may result in test failures.  We already have two tests that need this
> (this patch in fact fixes evmcs_test, similar to what was fixed for
> state_test in commit 0f73bbc851ed, "KVM: selftests: complete IO before
> migrating guest state", 2019-03-13) and a third is coming.  So, move the
> code to vcpu_save_state, and while at it do not access register state
> until after I/O is complete.
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c |  5 +
>  tools/testing/selftests/kvm/lib/x86_64/processor.c |  8 
>  tools/testing/selftests/kvm/x86_64/evmcs_test.c|  5 +++--
>  tools/testing/selftests/kvm/x86_64/state_test.c| 15 +--
>  4 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index efa0aad8b3c6..4ca96b228e46 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -91,6 +91,11 @@ static void vm_open(struct kvm_vm *vm, int perm, unsigned 
> long type)
>   if (vm->kvm_fd < 0)
>   exit(KSFT_SKIP);
>  
> + if (!kvm_check_cap(KVM_CAP_IMMEDIATE_EXIT)) {
> + fprintf(stderr, "immediate_exit not available, skipping 
> test\n");
> + exit(KSFT_SKIP);
> + }
> +
>   vm->fd = ioctl(vm->kvm_fd, KVM_CREATE_VM, type);
>   TEST_ASSERT(vm->fd >= 0, "KVM_CREATE_VM ioctl failed, "
>   "rc: %i errno: %i", vm->fd, errno);
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c 
> b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index f28127f4a3af..b363c9611bd6 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -1030,6 +1030,14 @@ struct kvm_x86_state *vcpu_save_state(struct kvm_vm 
> *vm, uint32_t vcpuid)
>   nested_size, sizeof(state->nested_));
>   }
>  
> + /*
> +  * When KVM exits to userspace with KVM_EXIT_IO, KVM guarantees
> +  * guest state is consistent only after userspace re-enters the
> +  * kernel with KVM_RUN.  Complete IO prior to migrating state
> +  * to a new VM.
> +  */
> + vcpu_run_complete_io(vm, vcpuid);
> +

Since the need for IO completion also affects MMIO exits and there's no
reason to believe that vcpu_save_state() will always be called after
an IO exit, then shouldn't we instead put this in get_ucall()?

diff --git a/tools/testing/selftests/kvm/lib/ucall.c 
b/tools/testing/selftests/kvm/lib/ucall.c
index a2ab38be2f47..e8c6f2741ce7 100644
--- a/tools/testing/selftests/kvm/lib/ucall.c
+++ b/tools/testing/selftests/kvm/lib/ucall.c
@@ -132,6 +132,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, 
struct ucall *uc)
if (ucall_type == UCALL_PIO && run->exit_reason == KVM_EXIT_IO &&
run->io.port == UCALL_PIO_PORT) {
struct kvm_regs regs;
+   vcpu_run_complete_io(vm, vcpu_id);
vcpu_regs_get(vm, vcpu_id, );
memcpy(uc, addr_gva2hva(vm, (vm_vaddr_t)regs.rdi), sizeof(*uc));
return uc->cmd;
@@ -144,6 +145,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, 
struct ucall *uc)
"Unexpected ucall exit mmio address access");
gva = *(vm_vaddr_t *)run->mmio.data;
memcpy(uc, addr_gva2hva(vm, gva), sizeof(*uc));
+   vcpu_run_complete_io(vm, vcpu_id);
}
 
return uc->cmd;



>   nmsrs = kvm_get_num_msrs(vm);
>   list = malloc(sizeof(*list) + nmsrs * sizeof(list->indices[0]));
>   list->nmsrs = nmsrs;
> diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c 
> b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> index c49c2a28b0eb..36669684eca5 100644
> --- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> @@ -123,8 +123,6 @@ int main(int argc, char *argv[])
>   stage, run->exit_reason,
>   exit_reason_str(run->exit_reason));
>  
> - memset(, 0, sizeof(regs1));
> - vcpu_regs_get(vm, VCPU_ID, );
>   switch (get_ucall(vm, VCPU_ID, )) {
>   case UCALL_ABORT:
>   TEST_ASSERT(false, "%s at %s:%d", (const char 
> *)uc.args[0],
> @@ -144,6 +142,9 @@ int main(int argc, char *argv[])
>   stage, (ulong)uc.args[1]);
>  
>   state = vcpu_save_state(vm, VCPU_ID);
> + memset(, 0, sizeof(regs1));
> + vcpu_regs_get(vm, VCPU_ID, );
> +

We would still need this change to get the vcpu_regs_get() below the
get_ucall().

>   kvm_vm_release(vm);
>  
>   /* Restore state in a new VM.  */
> diff --git 

Joint Partnership

2019-03-10 Thread Andrew Jones & Partners
Andrew Jones Partners
8 Kennington Rd,
Lambeth,
London,
SE1 7BL
i...@andrewjonespartners.co.uk
Phone: +447978256654
Fax: +44 8704795246


Re: [PATCH] kvm/arm: return 0 when the number of objects is not lessthan min

2018-12-06 Thread Andrew Jones
On Thu, Dec 06, 2018 at 09:56:30AM +0800, peng.h...@zte.com.cn wrote:
> >On Wed, Dec 05, 2018 at 09:15:51AM +0800, Peng Hao wrote:
> >> Return 0 when there is enough kvm_mmu_memory_cache object.
> >>
> >> Signed-off-by: Peng Hao 
> >> ---
> >>  virt/kvm/arm/mmu.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> index ed162a6..fcda0ce 100644
> >> --- a/virt/kvm/arm/mmu.c
> >> +++ b/virt/kvm/arm/mmu.c
> >> @@ -127,7 +127,7 @@ static int mmu_topup_memory_cache(struct 
> >> kvm_mmu_memory_cache *cache,
> >>  while (cache->nobjs < max) {
> >>  page = (void *)__get_free_page(PGALLOC_GFP);
> >>  if (!page)
> >> -return -ENOMEM;
> >> +return cache->nobjs >= min ? 0 : -ENOMEM;
> >
> >This condition will never be true here, as the exact same condition is
> >already checked above, and if it had been true, then we wouldn't be here.
> >
> >What problem are you attempting to solve?
> >
> if (cache->nobjs >= min)  --here cache->nobjs can continue downward 
>  return 0;
> while (cache->nobjs < max) {
> page = (void *)__get_free_page(PGALLOC_GFP);
> if (!page)
>return -ENOMEM; -here it is possible that  
> (cache->nobjs >= min) and (cache->nobjs cache->objects[cache->nobjs++] = page; ---here cache->nobjs increasing
>   }
> 
> I just think the logic of this function is to return 0 as long as 
> (cache->nobjs >= min).
> thanks.

Oh, I see now. This is the case where we can do enough allocating to over
the min line, but fail before we get to the max.

Thanks,
drew


Re: [PATCH V2] SelfTest: KVM: Drop Asserts for madvise MADV_NOHUGEPAGE failure

2018-11-16 Thread Andrew Jones
On Fri, Nov 16, 2018 at 01:50:55PM +0200, Ahmed Abd El Mawgood wrote:
> From: Ahmed Abd El Mawgood 
> 
> madvise() returns -1 without CONFIG_TRANSPARENT_HUGEPAGE=y. That would
> trigger asserts when checking for return value of madvice. Following
> similar decision to [1]. I thought it is ok to assume that madvise()
> MADV_NOHUGEPAGE failures implies that THP is not supported by host kernel.
> 
> Other options was to check for Transparent Huge Page support in
> /sys/kernel/mm/transparent_hugepage/enabled.
> 
> -- links --
> [1] https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04514.html
> 
> Signed-off-by: Ahmed Abd El Mawgood 
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 1b41e71283d5..437c5bb48061 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -586,14 +586,23 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>src_type == VM_MEM_SRC_ANONYMOUS_THP ?  
> huge_page_size : 1);
>  
>   /* As needed perform madvise */
> - if (src_type == VM_MEM_SRC_ANONYMOUS || src_type == 
> VM_MEM_SRC_ANONYMOUS_THP) {
> + if (src_type == VM_MEM_SRC_ANONYMOUS) {
> + /*
> +  * Neglect madvise error because it is ok to not have THP
> +  * support in this case.
> +  */
> + madvise(region->host_mem, npages * vm->page_size,
> + MADV_NOHUGEPAGE);
> + } else if (src_type == VM_MEM_SRC_ANONYMOUS_THP) {
>   ret = madvise(region->host_mem, npages * vm->page_size,
> -  src_type == VM_MEM_SRC_ANONYMOUS ? MADV_NOHUGEPAGE 
> : MADV_HUGEPAGE);
> + MADV_HUGEPAGE);
>   TEST_ASSERT(ret == 0, "madvise failed,\n"
> - "  addr: %p\n"
> - "  length: 0x%lx\n"
> - "  src_type: %x",
> - region->host_mem, npages * vm->page_size, src_type);
> + "Does the kernel have CONFIG_TRANSPARENT_HUGEPAGE=y\n"
> + "  addr: %p\n"
> + "  length: 0x%lx\n"
> +     "  src_type: %x\n",
> + region->host_mem, npages * vm->page_size,
> + src_type);
>   }
>  
>   region->unused_phy_pages = sparsebit_alloc();
> -- 
> 2.18.1
>

Reviewed-by: Andrew Jones 


Re: [PATCH V2] SelfTest: KVM: Drop Asserts for madvise MADV_NOHUGEPAGE failure

2018-11-16 Thread Andrew Jones
On Fri, Nov 16, 2018 at 01:50:55PM +0200, Ahmed Abd El Mawgood wrote:
> From: Ahmed Abd El Mawgood 
> 
> madvise() returns -1 without CONFIG_TRANSPARENT_HUGEPAGE=y. That would
> trigger asserts when checking for return value of madvice. Following
> similar decision to [1]. I thought it is ok to assume that madvise()
> MADV_NOHUGEPAGE failures implies that THP is not supported by host kernel.
> 
> Other options was to check for Transparent Huge Page support in
> /sys/kernel/mm/transparent_hugepage/enabled.
> 
> -- links --
> [1] https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04514.html
> 
> Signed-off-by: Ahmed Abd El Mawgood 
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 1b41e71283d5..437c5bb48061 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -586,14 +586,23 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>src_type == VM_MEM_SRC_ANONYMOUS_THP ?  
> huge_page_size : 1);
>  
>   /* As needed perform madvise */
> - if (src_type == VM_MEM_SRC_ANONYMOUS || src_type == 
> VM_MEM_SRC_ANONYMOUS_THP) {
> + if (src_type == VM_MEM_SRC_ANONYMOUS) {
> + /*
> +  * Neglect madvise error because it is ok to not have THP
> +  * support in this case.
> +  */
> + madvise(region->host_mem, npages * vm->page_size,
> + MADV_NOHUGEPAGE);
> + } else if (src_type == VM_MEM_SRC_ANONYMOUS_THP) {
>   ret = madvise(region->host_mem, npages * vm->page_size,
> -  src_type == VM_MEM_SRC_ANONYMOUS ? MADV_NOHUGEPAGE 
> : MADV_HUGEPAGE);
> + MADV_HUGEPAGE);
>   TEST_ASSERT(ret == 0, "madvise failed,\n"
> - "  addr: %p\n"
> - "  length: 0x%lx\n"
> - "  src_type: %x",
> - region->host_mem, npages * vm->page_size, src_type);
> + "Does the kernel have CONFIG_TRANSPARENT_HUGEPAGE=y\n"
> + "  addr: %p\n"
> + "  length: 0x%lx\n"
> +     "  src_type: %x\n",
> + region->host_mem, npages * vm->page_size,
> + src_type);
>   }
>  
>   region->unused_phy_pages = sparsebit_alloc();
> -- 
> 2.18.1
>

Reviewed-by: Andrew Jones 


Re: [PATCH] SelfTest: KVM: Drop Asserts for madvise failures

2018-11-16 Thread Andrew Jones
On Thu, Nov 15, 2018 at 08:09:07PM +0200, Ahmed Abd El Mawgood wrote:
> From: Ahmed Abd El Mawgood 
> 
> madvise() returns -1 without CONFIG_TRANSPARENT_HUGEPAGE=y. That would
> trigger asserts when checking for return value of madvice. Following
> similar decision to [1]. I thought it is ok to assume that madvise()
> failures implies that THP is not supported by host kernel.
> 
> Other options were to check for Transparent Huge Page support in
> /sys/kernel/mm/transparent_hugepage/enabled.
> 
> -- links --
> [1] https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04514.html
> 
> Signed-off-by: Ahmed Abd El Mawgood 
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 1b41e71283d5..7725cfdf1b79 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -586,14 +586,12 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>src_type == VM_MEM_SRC_ANONYMOUS_THP ?  
> huge_page_size : 1);
>  
>   /* As needed perform madvise */
> - if (src_type == VM_MEM_SRC_ANONYMOUS || src_type == 
> VM_MEM_SRC_ANONYMOUS_THP) {
> - ret = madvise(region->host_mem, npages * vm->page_size,
> -  src_type == VM_MEM_SRC_ANONYMOUS ? MADV_NOHUGEPAGE 
> : MADV_HUGEPAGE);
> - TEST_ASSERT(ret == 0, "madvise failed,\n"
> - "  addr: %p\n"
> - "  length: 0x%lx\n"
> - "  src_type: %x",
> - region->host_mem, npages * vm->page_size, src_type);
> + if (src_type == VM_MEM_SRC_ANONYMOUS) {
> + madvise(region->host_mem, npages * vm->page_size,
> + MADV_NOHUGEPAGE);

This is fine.

> + } else if (src_type == VM_MEM_SRC_ANONYMOUS_THP) {
> + madvise(region->host_mem, npages * vm->page_size,
> + MADV_HUGEPAGE);

I would still assert here, but with a more informative message like
"madvise(MADV_HUGEPAGE) failed. Does the kernel have
CONFIG_TRANSPARENT_HUGEPAGE=y?"

>   }
>  
>   region->unused_phy_pages = sparsebit_alloc();
> -- 
> 2.18.1
> 

Thanks,
drew


Re: [PATCH] SelfTest: KVM: Drop Asserts for madvise failures

2018-11-16 Thread Andrew Jones
On Thu, Nov 15, 2018 at 08:09:07PM +0200, Ahmed Abd El Mawgood wrote:
> From: Ahmed Abd El Mawgood 
> 
> madvise() returns -1 without CONFIG_TRANSPARENT_HUGEPAGE=y. That would
> trigger asserts when checking for return value of madvice. Following
> similar decision to [1]. I thought it is ok to assume that madvise()
> failures implies that THP is not supported by host kernel.
> 
> Other options were to check for Transparent Huge Page support in
> /sys/kernel/mm/transparent_hugepage/enabled.
> 
> -- links --
> [1] https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04514.html
> 
> Signed-off-by: Ahmed Abd El Mawgood 
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 1b41e71283d5..7725cfdf1b79 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -586,14 +586,12 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>src_type == VM_MEM_SRC_ANONYMOUS_THP ?  
> huge_page_size : 1);
>  
>   /* As needed perform madvise */
> - if (src_type == VM_MEM_SRC_ANONYMOUS || src_type == 
> VM_MEM_SRC_ANONYMOUS_THP) {
> - ret = madvise(region->host_mem, npages * vm->page_size,
> -  src_type == VM_MEM_SRC_ANONYMOUS ? MADV_NOHUGEPAGE 
> : MADV_HUGEPAGE);
> - TEST_ASSERT(ret == 0, "madvise failed,\n"
> - "  addr: %p\n"
> - "  length: 0x%lx\n"
> - "  src_type: %x",
> - region->host_mem, npages * vm->page_size, src_type);
> + if (src_type == VM_MEM_SRC_ANONYMOUS) {
> + madvise(region->host_mem, npages * vm->page_size,
> + MADV_NOHUGEPAGE);

This is fine.

> + } else if (src_type == VM_MEM_SRC_ANONYMOUS_THP) {
> + madvise(region->host_mem, npages * vm->page_size,
> + MADV_HUGEPAGE);

I would still assert here, but with a more informative message like
"madvise(MADV_HUGEPAGE) failed. Does the kernel have
CONFIG_TRANSPARENT_HUGEPAGE=y?"

>   }
>  
>   region->unused_phy_pages = sparsebit_alloc();
> -- 
> 2.18.1
> 

Thanks,
drew


Re: KVM selftests are failing

2018-11-15 Thread Andrew Jones
On Thu, Nov 15, 2018 at 03:36:44PM +0200, Ahmed Soliman wrote:
> mmap(NULL, 6291456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = 0x7f46ea2a1000
> madvise(0x7f46ea2a1000, 6291456, MADV_NOHUGEPAGE) = -1 EINVAL (Invalid 
> argument)
> 
> For comprehension, this is done on intel core i7-4500U CPU @ 1.80GHz

Argh. I see what it is. Your config doesn't have CONFIG_TRANSPARENT_HUGEPAGE=y,
so madvise_behavior_valid() returns false, which causes madvise() to
immediately return EINVAL. We should be more careful in kvm selftests with
our madvise behavior use.

> 
> As for now I will comment the madvise line and the assert when writing
> my own kvm self test. I think it wouldn't cause any trouble?, If it is
> not the case, please let me know.
> 

You may not need madvise() at all for your test, depending on what you're
doing. So leaving it out may be fine. Reworking kvm selftests to ensure
only valid madvise behaviors are used (and only when necessary), before
adding new tests, would be best.

Thanks,
drew


Re: KVM selftests are failing

2018-11-15 Thread Andrew Jones
On Thu, Nov 15, 2018 at 03:36:44PM +0200, Ahmed Soliman wrote:
> mmap(NULL, 6291456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = 0x7f46ea2a1000
> madvise(0x7f46ea2a1000, 6291456, MADV_NOHUGEPAGE) = -1 EINVAL (Invalid 
> argument)
> 
> For comprehension, this is done on intel core i7-4500U CPU @ 1.80GHz

Argh. I see what it is. Your config doesn't have CONFIG_TRANSPARENT_HUGEPAGE=y,
so madvise_behavior_valid() returns false, which causes madvise() to
immediately return EINVAL. We should be more careful in kvm selftests with
our madvise behavior use.

> 
> As for now I will comment the madvise line and the assert when writing
> my own kvm self test. I think it wouldn't cause any trouble?, If it is
> not the case, please let me know.
> 

You may not need madvise() at all for your test, depending on what you're
doing. So leaving it out may be fine. Reworking kvm selftests to ensure
only valid madvise behaviors are used (and only when necessary), before
adding new tests, would be best.

Thanks,
drew


  1   2   3   4   5   >