Re: [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices

2015-11-04 Thread Igor Mammedov
On Tue, 3 Nov 2015 22:22:40 +0800
Xiao Guangrong  wrote:

> 
> 
> On 11/03/2015 09:13 PM, Igor Mammedov wrote:
> > On Mon,  2 Nov 2015 17:13:29 +0800
> > Xiao Guangrong  wrote:
> >
> >> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
> >>
> >> There is a root device under \_SB and specified NVDIMM devices are under 
> >> the
> >> root device. Each NVDIMM device has _ADR which returns its handle used to
> >> associate MEMDEV structure in NFIT
> >>
> >> We reserve handle 0 for root device. In this patch, we save handle, handle,
> >> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
> >>
> >> Signed-off-by: Xiao Guangrong 
> >> ---
> >>   hw/acpi/nvdimm.c | 184 
> >> +++
> >>   1 file changed, 184 insertions(+)
> >>
> >> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> >> index dd84e5f..53ed675 100644
> >> --- a/hw/acpi/nvdimm.c
> >> +++ b/hw/acpi/nvdimm.c
> >> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, 
> >> GArray *table_offsets,
> >>   g_array_free(structures, true);
> >>   }
> >>
> >> +struct NvdimmDsmIn {
> >> +uint32_t handle;
> >> +uint32_t revision;
> >> +uint32_t function;
> >> +   /* the remaining size in the page is used by arg3. */
> >> +uint8_t arg3[0];
> >> +} QEMU_PACKED;
> >> +typedef struct NvdimmDsmIn NvdimmDsmIn;
> >> +
> >>   static uint64_t
> >>   nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
> >>   {
> >> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned 
> >> size)
> >>   static void
> >>   nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
> >>   {
> >> +fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
> > it doesn't seem like this hunk belongs here
> 
> Er, we have changed the logic:
> - others:
>1) the buffer length is directly got from IO read rather than got
>   from dsm memory
> [ This has documented in v5's changelog. ]
> 
> So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
> triggered.
> 
> >
> >>   }
> >>
> >>   static const MemoryRegionOps nvdimm_dsm_ops = {
> >> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, 
> >> MemoryRegion *io,
> >>   memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
> >>   }
> >>
> >> +#define BUILD_STA_METHOD(_dev_, _method_) 
> >>  \
> >> +do {  
> >>  \
> >> +_method_ = aml_method("_STA", 0); 
> >>  \
> >> +aml_append(_method_, aml_return(aml_int(0x0f)));  
> >>  \
> >> +aml_append(_dev_, _method_);  
> >>  \
> >> +} while (0)
> > _STA doesn't have any logic here so drop macro and just
> > replace its call sites with:
> 
> Okay, I was just wanting to save some code lines. I will drop this macro.
> 
> >
> > aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));
> 
> _STA is required as a method with zero argument but this statement just
> define a object. It is okay?
Spec doesn't say that it must be method, it says that it will evaluate _STA 
object
and result must be a combination of defined flags.
AML wise calling a method with 0 arguments and referencing named variable
is the same thing, both end up being just a namestring.

Also note that _STA here return 0xF, and spec says that if _STA is missing
OSPM shall assume its implicit value being 0xF, so you can just drop _STA
object here altogether.

> 
> >
> >
> >> +
> >> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)   
> >>  \
> >> +do {  
> >>  \
> >> +Aml *ifctx, *uuid;
> >>  \
> >> +_method_ = aml_method("_DSM", 4); 
> >>  \
> >> +/* check UUID if it is we expect, return the errorcode if not.*/  
> >>  \
> >> +uuid = aml_touuid(_uuid_);
> >>  \
> >> +ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));
> >>  \
> >> +aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));
> >>  \
> >> +aml_append(method, ifctx);
> >>  \
> >> +aml_append(method, aml_return(aml_call4("NCAL", 
> >> aml_int(_handle_), \
> >> +   aml_arg(1), aml_arg(2), aml_arg(3; 
> >>  \
> >> +aml_append(_dev_, _method_);  
> >>  \
> >> +} while (0)
> >> +
> >> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)
> >>  \
> >> +aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
> >> +
> >> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)
> >>  \
>

[PATCH 0/1] KVM: PPC: Increase memslots

2015-11-04 Thread Thomas Huth
Now that the patch from Nikunj to support the KVM_CAP_NR_MEMSLOTS
capability on powerpc has been merged to kvm/next, we can/should
increase the amount of memslots on ppc, too.
Since nobody else sent a patch yet (as far as I can see), I'd like
to suggest to increase the slot number to 320 now. Why 320? Well,
x86 uses 509 (+3 internal slots), to give enough space for
256 pluggable DIMMs and 253 other slots (for PCI etc.).
On powerpc, QEMU only supports 32 pluggable DIMMs, so I think we
should be fine by using something around 253 + 32 slots + some few
extra ... so 320 sounds like a good value to me for the time
being (but in the long run, we should really make this dynamically
instead, I think).

Thomas Huth (1):
  KVM: PPC: Increase memslots to 320

 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-11-04 Thread Thomas Huth
Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.
x86 already increased the limit to 512 in total, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
QEMU, not 256, so we likely do not as much slots as on x86. Thus
setting the slot limit to 320 sounds like a good value for the
time being (until we have some code in the future to resize the
memslot array dynamically).
And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 887c259..89d387a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,8 +38,7 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_USER_MEM_SLOTS 32
-#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
+#define KVM_USER_MEM_SLOTS 320
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Makefile: allow overriding CFLAGS on the command line

2015-11-04 Thread Riku Voipio
On 30 October 2015 at 19:20, Andre Przywara  wrote:
> When a Makefile variable is set on the make command line, all
> Makefile-internal assignments to that very variable are _ignored_.
> Since we add quite some essential values to CFLAGS internally,
> specifying some CFLAGS on the command line will usually break the
> build (and not fix any include file problems you hoped to overcome
> with that).
> Somewhat against intuition GNU make provides the "override" directive
> to change this behavior; with that assignments in the Makefile get
> _appended_ to the value given on the command line. [1]
>
> Change any internal assignments to use that directive, so that a user
> can use:
> $ make CFLAGS=/path/to/my/include/dir
> to teach kvmtool about non-standard header file locations (helpful
> for cross-compilation) or to tweak other compiler options.
>
> Signed-off-by: Andre Przywara 
>
> [1] https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
> ---
>  Makefile | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index f8f7cc4..77a7c9f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -15,9 +15,7 @@ include config/utilities.mak
>  include config/feature-tests.mak
>
>  CC := $(CROSS_COMPILE)gcc
> -CFLAGS :=
>  LD := $(CROSS_COMPILE)ld
> -LDFLAGS:=

This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS
to something unsuitable for guest init.

Looks like this has been an issue before:

commit 57fa349a9792a629e4ed2d89e1309cc96dcc39af
Author: Will Deacon 
Date:   Thu Jun 4 16:25:36 2015 +0100

Don't inherit CFLAGS and LDFLAGS from the environment

kvmtool doesn't build with arbitrary flags, so let's clear CFLAGS and
LDFLAGS by default at the top of the Makefile, allowing people to add
additional options there if they really want to.

Reported by Dave Jones, who ended up passing -std=gnu99 by mistake.

I think it's better to have EXTRA_CFLAGS and EXTRA_LDFLAGS like the kernel
has.

>  FIND   := find
>  CSCOPE := cscope
> @@ -162,7 +160,7 @@ ifeq ($(ARCH), arm)
> OBJS+= arm/aarch32/kvm-cpu.o
> ARCH_INCLUDE:= $(HDRS_ARM_COMMON)
> ARCH_INCLUDE+= -Iarm/aarch32/include
> -   CFLAGS  += -march=armv7-a
> +   override CFLAGS += -march=armv7-a
>
> ARCH_WANT_LIBFDT := y
>  endif
> @@ -274,12 +272,12 @@ endif
>  ifeq ($(LTO),1)
> FLAGS_LTO := -flto
> ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),$(FLAGS_LTO)),y)
> -   CFLAGS  += $(FLAGS_LTO)
> +   override CFLAGS += $(FLAGS_LTO)
> endif
>  endif
>
>  ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y)
> -   CFLAGS  += -DCONFIG_GUEST_INIT
> +   override CFLAGS += -DCONFIG_GUEST_INIT
> GUEST_INIT  := guest/init
> GUEST_OBJS  = guest/guest_init.o
> ifeq ($(ARCH_PRE_INIT),)
> @@ -331,7 +329,8 @@ DEFINES += -DKVMTOOLS_VERSION='"$(KVMTOOLS_VERSION)"'
>  DEFINES+= -DBUILD_ARCH='"$(ARCH)"'
>
>  KVM_INCLUDE := include
> -CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -O2 
> -fno-strict-aliasing -g
> +override CFLAGS+= $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) 
> -I$(ARCH_INCLUDE)
> +override CFLAGS+= -O2 -fno-strict-aliasing -g
>
>  WARNINGS += -Wall
>  WARNINGS += -Wformat=2
> @@ -349,10 +348,10 @@ WARNINGS += -Wvolatile-register-var
>  WARNINGS += -Wwrite-strings
>  WARNINGS += -Wno-format-nonliteral
>
> -CFLAGS += $(WARNINGS)
> +override CFLAGS+= $(WARNINGS)
>
>  ifneq ($(WERROR),0)
> -   CFLAGS += -Werror
> +   override CFLAGS += -Werror
>  endif
>
>  all: $(PROGRAM) $(PROGRAM_ALIAS) $(GUEST_INIT) $(GUEST_PRE_INIT)
> --
> 2.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Makefile: allow overriding CFLAGS on the command line

2015-11-04 Thread Andre Przywara
Hi Riku,

On 04/11/15 10:02, Riku Voipio wrote:
> On 30 October 2015 at 19:20, Andre Przywara  wrote:
>> When a Makefile variable is set on the make command line, all
>> Makefile-internal assignments to that very variable are _ignored_.
>> Since we add quite some essential values to CFLAGS internally,
>> specifying some CFLAGS on the command line will usually break the
>> build (and not fix any include file problems you hoped to overcome
>> with that).
>> Somewhat against intuition GNU make provides the "override" directive
>> to change this behavior; with that assignments in the Makefile get
>> _appended_ to the value given on the command line. [1]
>>
>> Change any internal assignments to use that directive, so that a user
>> can use:
>> $ make CFLAGS=/path/to/my/include/dir
>> to teach kvmtool about non-standard header file locations (helpful
>> for cross-compilation) or to tweak other compiler options.
>>
>> Signed-off-by: Andre Przywara 
>>
>> [1] 
>> https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
>> ---
>>  Makefile | 15 +++
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index f8f7cc4..77a7c9f 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -15,9 +15,7 @@ include config/utilities.mak
>>  include config/feature-tests.mak
>>
>>  CC := $(CROSS_COMPILE)gcc
>> -CFLAGS :=
>>  LD := $(CROSS_COMPILE)ld
>> -LDFLAGS:=
> 
> This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS
> to something unsuitable for guest init.
> 
> Looks like this has been an issue before:

> 
> commit 57fa349a9792a629e4ed2d89e1309cc96dcc39af
> Author: Will Deacon 
> Date:   Thu Jun 4 16:25:36 2015 +0100
> 
> Don't inherit CFLAGS and LDFLAGS from the environment
> 
> kvmtool doesn't build with arbitrary flags, so let's clear CFLAGS and
> LDFLAGS by default at the top of the Makefile, allowing people to add
> additional options there if they really want to.
> 
> Reported by Dave Jones, who ended up passing -std=gnu99 by mistake.

Well, I fixed this issue later with making kvmtool compilation more
robust when using modern compiler standards.
That's why I wanted this kludge to go away.

> I think it's better to have EXTRA_CFLAGS and EXTRA_LDFLAGS like the kernel
> has.

Mmmh, I'd rather see guest_init creation use their own flags for it,
since it is so special and actually independent from the target userland.
Let me check this out and send out my guest_init Makefile fix I have
lying around here on the way.

What LDFLAGS are actually causing your issues?

Cheers,
Andre.

> 
>>  FIND   := find
>>  CSCOPE := cscope
>> @@ -162,7 +160,7 @@ ifeq ($(ARCH), arm)
>> OBJS+= arm/aarch32/kvm-cpu.o
>> ARCH_INCLUDE:= $(HDRS_ARM_COMMON)
>> ARCH_INCLUDE+= -Iarm/aarch32/include
>> -   CFLAGS  += -march=armv7-a
>> +   override CFLAGS += -march=armv7-a
>>
>> ARCH_WANT_LIBFDT := y
>>  endif
>> @@ -274,12 +272,12 @@ endif
>>  ifeq ($(LTO),1)
>> FLAGS_LTO := -flto
>> ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),$(FLAGS_LTO)),y)
>> -   CFLAGS  += $(FLAGS_LTO)
>> +   override CFLAGS += $(FLAGS_LTO)
>> endif
>>  endif
>>
>>  ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y)
>> -   CFLAGS  += -DCONFIG_GUEST_INIT
>> +   override CFLAGS += -DCONFIG_GUEST_INIT
>> GUEST_INIT  := guest/init
>> GUEST_OBJS  = guest/guest_init.o
>> ifeq ($(ARCH_PRE_INIT),)
>> @@ -331,7 +329,8 @@ DEFINES += -DKVMTOOLS_VERSION='"$(KVMTOOLS_VERSION)"'
>>  DEFINES+= -DBUILD_ARCH='"$(ARCH)"'
>>
>>  KVM_INCLUDE := include
>> -CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -O2 
>> -fno-strict-aliasing -g
>> +override CFLAGS+= $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) 
>> -I$(ARCH_INCLUDE)
>> +override CFLAGS+= -O2 -fno-strict-aliasing -g
>>
>>  WARNINGS += -Wall
>>  WARNINGS += -Wformat=2
>> @@ -349,10 +348,10 @@ WARNINGS += -Wvolatile-register-var
>>  WARNINGS += -Wwrite-strings
>>  WARNINGS += -Wno-format-nonliteral
>>
>> -CFLAGS += $(WARNINGS)
>> +override CFLAGS+= $(WARNINGS)
>>
>>  ifneq ($(WERROR),0)
>> -   CFLAGS += -Werror
>> +   override CFLAGS += -Werror
>>  endif
>>
>>  all: $(PROGRAM) $(PROGRAM_ALIAS) $(GUEST_INIT) $(GUEST_PRE_INIT)
>> --
>> 2.5.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Makefile: allow overriding CFLAGS on the command line

2015-11-04 Thread Riku Voipio
On 4 November 2015 at 12:13, Andre Przywara  wrote:
> Hi Riku,
>
> On 04/11/15 10:02, Riku Voipio wrote:
>> On 30 October 2015 at 19:20, Andre Przywara  wrote:
>>> When a Makefile variable is set on the make command line, all
>>> Makefile-internal assignments to that very variable are _ignored_.
>>> Since we add quite some essential values to CFLAGS internally,
>>> specifying some CFLAGS on the command line will usually break the
>>> build (and not fix any include file problems you hoped to overcome
>>> with that).
>>> Somewhat against intuition GNU make provides the "override" directive
>>> to change this behavior; with that assignments in the Makefile get
>>> _appended_ to the value given on the command line. [1]
>>>
>>> Change any internal assignments to use that directive, so that a user
>>> can use:
>>> $ make CFLAGS=/path/to/my/include/dir
>>> to teach kvmtool about non-standard header file locations (helpful
>>> for cross-compilation) or to tweak other compiler options.
>>>
>>> Signed-off-by: Andre Przywara 
>>>
>>> [1] 
>>> https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
>>> ---
>>>  Makefile | 15 +++
>>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/Makefile b/Makefile
>>> index f8f7cc4..77a7c9f 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -15,9 +15,7 @@ include config/utilities.mak
>>>  include config/feature-tests.mak
>>>
>>>  CC := $(CROSS_COMPILE)gcc
>>> -CFLAGS :=
>>>  LD := $(CROSS_COMPILE)ld
>>> -LDFLAGS:=
>>
>> This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS
>> to something unsuitable for guest init.
>>
>> Looks like this has been an issue before:
>
>>
>> commit 57fa349a9792a629e4ed2d89e1309cc96dcc39af
>> Author: Will Deacon 
>> Date:   Thu Jun 4 16:25:36 2015 +0100
>>
>> Don't inherit CFLAGS and LDFLAGS from the environment
>>
>> kvmtool doesn't build with arbitrary flags, so let's clear CFLAGS and
>> LDFLAGS by default at the top of the Makefile, allowing people to add
>> additional options there if they really want to.
>>
>> Reported by Dave Jones, who ended up passing -std=gnu99 by mistake.
>
> Well, I fixed this issue later with making kvmtool compilation more
> robust when using modern compiler standards.
> That's why I wanted this kludge to go away.
>
>> I think it's better to have EXTRA_CFLAGS and EXTRA_LDFLAGS like the kernel
>> has.
>
> Mmmh, I'd rather see guest_init creation use their own flags for it,
> since it is so special and actually independent from the target userland.
> Let me check this out and send out my guest_init Makefile fix I have
> lying around here on the way.
>
> What LDFLAGS are actually causing your issues?

  LINK guest/init
ld: unrecognized option '-Wl,-z,relro'

That's another issue - kvmtool passes LDFLAGS to both LD and CC yet they
actually take different command line options.

> Cheers,
> Andre.
>
>>
>>>  FIND   := find
>>>  CSCOPE := cscope
>>> @@ -162,7 +160,7 @@ ifeq ($(ARCH), arm)
>>> OBJS+= arm/aarch32/kvm-cpu.o
>>> ARCH_INCLUDE:= $(HDRS_ARM_COMMON)
>>> ARCH_INCLUDE+= -Iarm/aarch32/include
>>> -   CFLAGS  += -march=armv7-a
>>> +   override CFLAGS += -march=armv7-a
>>>
>>> ARCH_WANT_LIBFDT := y
>>>  endif
>>> @@ -274,12 +272,12 @@ endif
>>>  ifeq ($(LTO),1)
>>> FLAGS_LTO := -flto
>>> ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),$(FLAGS_LTO)),y)
>>> -   CFLAGS  += $(FLAGS_LTO)
>>> +   override CFLAGS += $(FLAGS_LTO)
>>> endif
>>>  endif
>>>
>>>  ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y)
>>> -   CFLAGS  += -DCONFIG_GUEST_INIT
>>> +   override CFLAGS += -DCONFIG_GUEST_INIT
>>> GUEST_INIT  := guest/init
>>> GUEST_OBJS  = guest/guest_init.o
>>> ifeq ($(ARCH_PRE_INIT),)
>>> @@ -331,7 +329,8 @@ DEFINES += 
>>> -DKVMTOOLS_VERSION='"$(KVMTOOLS_VERSION)"'
>>>  DEFINES+= -DBUILD_ARCH='"$(ARCH)"'
>>>
>>>  KVM_INCLUDE := include
>>> -CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -O2 
>>> -fno-strict-aliasing -g
>>> +override CFLAGS+= $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) 
>>> -I$(ARCH_INCLUDE)
>>> +override CFLAGS+= -O2 -fno-strict-aliasing -g
>>>
>>>  WARNINGS += -Wall
>>>  WARNINGS += -Wformat=2
>>> @@ -349,10 +348,10 @@ WARNINGS += -Wvolatile-register-var
>>>  WARNINGS += -Wwrite-strings
>>>  WARNINGS += -Wno-format-nonliteral
>>>
>>> -CFLAGS += $(WARNINGS)
>>> +override CFLAGS+= $(WARNINGS)
>>>
>>>  ifneq ($(WERROR),0)
>>> -   CFLAGS += -Werror
>>> +   override CFLAGS += -Werror
>>>  endif
>>>
>>>  all: $(PROGRAM) $(PROGRAM_ALIAS) $(GUEST_INIT) $(GUEST_PRE_INIT)
>>> --
>>> 2.5.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-in

Re: [PATCH 1/2] Makefile: allow overriding CFLAGS on the command line

2015-11-04 Thread Will Deacon
On Wed, Nov 04, 2015 at 12:02:23PM +0200, Riku Voipio wrote:
> On 30 October 2015 at 19:20, Andre Przywara  wrote:
> > When a Makefile variable is set on the make command line, all
> > Makefile-internal assignments to that very variable are _ignored_.
> > Since we add quite some essential values to CFLAGS internally,
> > specifying some CFLAGS on the command line will usually break the
> > build (and not fix any include file problems you hoped to overcome
> > with that).
> > Somewhat against intuition GNU make provides the "override" directive
> > to change this behavior; with that assignments in the Makefile get
> > _appended_ to the value given on the command line. [1]
> >
> > Change any internal assignments to use that directive, so that a user
> > can use:
> > $ make CFLAGS=/path/to/my/include/dir
> > to teach kvmtool about non-standard header file locations (helpful
> > for cross-compilation) or to tweak other compiler options.
> >
> > Signed-off-by: Andre Przywara 
> >
> > [1] 
> > https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
> > ---
> >  Makefile | 15 +++
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index f8f7cc4..77a7c9f 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -15,9 +15,7 @@ include config/utilities.mak
> >  include config/feature-tests.mak
> >
> >  CC := $(CROSS_COMPILE)gcc
> > -CFLAGS :=
> >  LD := $(CROSS_COMPILE)ld
> > -LDFLAGS:=
> 
> This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS
> to something unsuitable for guest init.

Thanks for the report, Riku. I'll revert this patch while we rethink how
to support user-supplied toolchain flags.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too

2015-11-04 Thread Will Deacon
Hi Sasha,

[adding the kvm list; not sure why it was dropped]

On Wed, Oct 28, 2015 at 01:34:25PM +, Will Deacon wrote:
> On Wed, Oct 28, 2015 at 09:00:07AM -0400, Sasha Levin wrote:
> > On 10/28/2015 08:27 AM, Will Deacon wrote:
> > > On Tue, Oct 27, 2015 at 10:08:44PM -0400, Sasha Levin wrote:
> > >> > On 10/27/2015 12:08 PM, Will Deacon wrote:
> >  > >>   for (i = 0; i < kvm->nrcpus; i++)
> > > > >> > -  pthread_kill(kvm->cpus[i]->thread, SIGKVMPAUSE);
> > > > >> > +  if (kvm->cpus[i]->is_running)
> > > > >> > +  pthread_kill(kvm->cpus[i]->thread, 
> > > > >> > SIGKVMPAUSE);
> > > > >> > +  else
> > > > >> > +  paused_vcpus++;
> > >>> > > What serialises the test of kvm->cpus[i]->is_running against a 
> > >>> > > concurrent
> > >>> > > SIGKVMEXIT?
> > >> > 
> > >> > If there's a pause signal pending we'd still end up marking it as 
> > >> > paused
> > >> > and do the whole process even though the vcpu is actually dead, so 
> > >> > while
> > >> > we can race with SIGKVMEXIT, it'll just mean we'd go through the whole
> > >> > pausing procedure even though the vcpu is dead.
> > >> > 
> > >> > Or did you mean something else?
> > > I couldn't convince myself there was an issue here (hence the question ;),
> > > but I was wondering about something like:
> > > 
> > >   1. The VCPUn thread takes SIGKVMEXIT (e.g. due to kvm_cpu__reboot)
> > >   2. The IPC thread handles a pause event and reads 
> > > kvm->cpus[n]->is_running
> > >  as true
> > >   3. VCPUn sets kvm->cpus[n]->is_running to false
> > >   4. VCPUn exits
> > >   5. The IPC thread trues to pthread_kill on an exited thread
> > > 
> > > am I missing some obvious synchronisation here?
> > 
> > Can we take two signals in parallel? (I'm really not sure). If yes, than
> > what you've described is a problem (and has been for a while). If not,
> > then no :)
> 
> Regardless, isn't the pause event coming from polling the IPC file
> descriptor as opposed to a signal?

It looks like lkvm is currently SEGVing on exit due to the original
issue. Digging deeper, it looks like we should be taking the pause_lock
for the SIGKVMEXIT case too, so that we can serialise access to is_running.

What do you think about the patch below?

Will

--->8

diff --git a/hw/i8042.c b/hw/i8042.c
index 3801e20a675d..288b7d1108ac 100644
--- a/hw/i8042.c
+++ b/hw/i8042.c
@@ -163,7 +163,7 @@ static void kbd_write_command(struct kvm *kvm, u8 val)
state.mode &= ~MODE_DISABLE_AUX;
break;
case I8042_CMD_SYSTEM_RESET:
-   kvm_cpu__reboot(kvm);
+   kvm__reboot(kvm);
break;
default:
break;
diff --git a/include/kvm/kvm-cpu.h b/include/kvm/kvm-cpu.h
index aa0cb543f8fb..c4c9cca449eb 100644
--- a/include/kvm/kvm-cpu.h
+++ b/include/kvm/kvm-cpu.h
@@ -12,7 +12,6 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu);
 void kvm_cpu__setup_cpuid(struct kvm_cpu *vcpu);
 void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu);
 void kvm_cpu__run(struct kvm_cpu *vcpu);
-void kvm_cpu__reboot(struct kvm *kvm);
 int kvm_cpu__start(struct kvm_cpu *cpu);
 bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu);
 int kvm_cpu__get_endianness(struct kvm_cpu *vcpu);
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 37155dbc132b..a474dae6c2cf 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -93,6 +93,7 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 
phys_addr_len, bool c
   void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 
*data, u32 len, u8 is_write, void *ptr),
void *ptr);
 bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
+void kvm__reboot(struct kvm *kvm);
 void kvm__pause(struct kvm *kvm);
 void kvm__continue(struct kvm *kvm);
 void kvm__notify_paused(void);
diff --git a/kvm-cpu.c b/kvm-cpu.c
index ad4441b1d96c..3e2037e3ccb3 100644
--- a/kvm-cpu.c
+++ b/kvm-cpu.c
@@ -45,10 +45,8 @@ void kvm_cpu__run(struct kvm_cpu *vcpu)
 static void kvm_cpu_signal_handler(int signum)
 {
if (signum == SIGKVMEXIT) {
-   if (current_kvm_cpu && current_kvm_cpu->is_running) {
+   if (current_kvm_cpu && current_kvm_cpu->is_running)
current_kvm_cpu->is_running = false;
-   kvm__continue(current_kvm_cpu->kvm);
-   }
} else if (signum == SIGKVMPAUSE) {
current_kvm_cpu->paused = 1;
}
@@ -70,19 +68,6 @@ static void kvm_cpu__handle_coalesced_mmio(struct kvm_cpu 
*cpu)
}
 }
 
-void kvm_cpu__reboot(struct kvm *kvm)
-{
-   int i;
-
-   /* The kvm->cpus array contains a null pointer in the last location */
-   for (i = 0; ; i++) {
-   if (kvm->cpus[i])
-   pthread_kill(kvm->cpus[i]->thread, SIGKVMEXIT);
-   else
-   break;
-   }
-}
-
 int kvm_cpu__start(st

Re: [PATCH] KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()

2015-11-04 Thread Paolo Bonzini


On 03/11/2015 19:34, Laszlo Ersek wrote:
> Commit b18d5431acc7 ("KVM: x86: fix CR0.CD virtualization") was
> technically correct, but it broke OVMF guests by slowing down various
> parts of the firmware.
> 
> Commit fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") quirked the
> first function modified by b18d5431acc7, vmx_get_mt_mask(), for OVMF's
> sake. This restored the speed of the OVMF code that runs before
> PlatformPei (including the memory intensive LZMA decompression in SEC).
> 
> This patch extends the quirk to the second function modified by
> b18d5431acc7, kvm_set_cr0(). It eliminates the intrusive slowdown that
> hits the EFI_MP_SERVICES_PROTOCOL implementation of edk2's
> UefiCpuPkg/CpuDxe -- which is built into OVMF --, when CpuDxe starts up
> all APs at once for initialization, in order to count them.
> 
> We also carry over the kvm_arch_has_noncoherent_dma() sub-condition from
> the other half of the original commit b18d5431acc7.
> 
> Cc: Paolo Bonzini 
> Cc: Jordan Justen 
> Cc: Janusz Mocek 
> Cc: Alex Williamson 
> Cc: Xiao Guangrong 
> Signed-off-by: Laszlo Ersek 
> ---
>  arch/x86/kvm/x86.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a24bae0..30723a4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -625,7 +625,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>   if ((cr0 ^ old_cr0) & update_bits)
>   kvm_mmu_reset_context(vcpu);
>  
> - if ((cr0 ^ old_cr0) & X86_CR0_CD)
> + if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
> + kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
> + !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
>   kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
>  
>   return 0;
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2] KVM: VMX: Fix commit which broke PML

2015-11-04 Thread Paolo Bonzini


On 04/11/2015 06:46, Kai Huang wrote:
> I found PML was broken since below commit:
> 
>   commit feda805fe7c4ed9cf78158e73b1218752e3b4314
>   Author: Xiao Guangrong 
>   Date:   Wed Sep 9 14:05:55 2015 +0800
> 
>   KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update
> 
>   Unify the update in vmx_cpuid_update()
> 
>   Signed-off-by: Xiao Guangrong 
>   [Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
>   Signed-off-by: Paolo Bonzini 
> 
> The reason is in above commit vmx_cpuid_update calls 
> vmx_secondary_exec_control,
> in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally 
> (as
> PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called 
> after
> vcpu is created, PML will be disabled unexpectedly while log-dirty code still
> thinks PML is used.
> 
> Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control
> only when PML is not supported or not enabled (!enable_pml). This is more
> reasonable as PML is currently either always enabled or disabled. With this
> explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not
> needed so also rename vmx_enable{disable}_pml to 
> vmx_create{destroy}_pml_buffer.
> 
> Signed-off-by: Kai Huang 
> 
> ---
> 
> v1->v2: Fix this by following Paolo's suggestion. It's better to not to clear
> SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control unconditionally but 
> only
> clear it when PML is not supported or enabled.
> 
> ---
>  arch/x86/kvm/vmx.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 2ac11641..89f4fa2 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4718,8 +4718,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx 
> *vmx)
>  a current VMCS12
>   */
>   exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
> - /* PML is enabled/disabled in creating/destorying vcpu */
> - exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
> +
> + if (!enable_pml)
> + exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>  
>   /* Currently, we allow L1 guest to directly run pcommit instruction. */
>   exec_control &= ~SECONDARY_EXEC_PCOMMIT;
> @@ -7804,7 +7805,7 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
> u64 *info1, u64 *info2)
>   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
>  }
>  
> -static int vmx_enable_pml(struct vcpu_vmx *vmx)
> +static int vmx_create_pml_buffer(struct vcpu_vmx *vmx)
>  {
>   struct page *pml_pg;
>  
> @@ -7817,12 +7818,10 @@ static int vmx_enable_pml(struct vcpu_vmx *vmx)
>   vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
>   vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>  
> - vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL, SECONDARY_EXEC_ENABLE_PML);
> -
>   return 0;
>  }
>  
> -static void vmx_disable_pml(struct vcpu_vmx *vmx)
> +static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
>  {
>   ASSERT(vmx->pml_pg);
>   __free_page(vmx->pml_pg);
> @@ -8706,7 +8705,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
>   struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
>   if (enable_pml)
> - vmx_disable_pml(vmx);
> + vmx_destroy_pml_buffer(vmx);
>   free_vpid(vmx->vpid);
>   leave_guest_mode(vcpu);
>   vmx_load_vmcs01(vcpu);
> @@ -8790,7 +8789,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
> *kvm, unsigned int id)
>* for the guest, etc.
>*/
>   if (enable_pml) {
> - err = vmx_enable_pml(vmx);
> + err = vmx_create_pml_buffer(vmx);
>   if (err)
>   goto free_vmcs;
>   }
> 


Applied, thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path

2015-11-04 Thread Eduardo Habkost
On Wed, Nov 04, 2015 at 11:12:41AM +0800, Xiao Guangrong wrote:
> On 11/04/2015 07:00 AM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
> >>Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> >>of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> >>locates at DAX enabled filesystem
> >>
> >>So this patch let it work on any kind of path
> >>
> >>Signed-off-by: Xiao Guangrong 
> >>---
> >>  exec.c | 24 
> >>  1 file changed, 12 insertions(+), 12 deletions(-)
> >>
> >>diff --git a/exec.c b/exec.c
> >>index 9de38be..9075f4d 100644
> >>--- a/exec.c
> >>+++ b/exec.c
> >>@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
> >>  char *c;
> >>  void *area;
> >>  int fd;
> >>-uint64_t hpagesize;
> >>+uint64_t pagesize;
> >>  Error *local_err = NULL;
> >>
> >>-hpagesize = qemu_file_get_page_size(path, &local_err);
> >>+pagesize = qemu_file_get_page_size(path, &local_err);
> >>  if (local_err) {
> >>  error_propagate(errp, local_err);
> >>  goto error;
> >>  }
> >>
> >>-if (hpagesize == getpagesize()) {
> >>-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> >>+if (pagesize == getpagesize()) {
> >>+fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
> >
> >If the point of this patch is to allow file_ram_alloc() to not be
> >specific to hugetlbfs anymore, this warning can simply go away.
> >
> >(And in case if you really want to keep the warning, I don't see the
> >point of the changes you made to it.)
> >
> 
> This is the history why we did it like this:
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html

The rule I am trying to follow is simple: if there are valid use cases
(e.g. nvdimm, tmpfs) where the warning will be printed every single time
QEMU runs, the warning is not appropriate.

If you really want to keep a warning, please make it not be printed on
all other valid use cases (nvdimm and tmpfs). Personally, I don't think
adding those additional checks would be worth the trouble, that's why I
suggest removing the warning.

> 
> Q:
> | What this *actually* is trying to warn against is that
> | mapping a regular file (as opposed to hugetlbfs)
> | means transparent huge pages don't work.

I don't think the author of that warning even thought about transparent
huge pages (did THP even existed when it was written?). I believe they
just assumed that the only reason for using -mem-path would be hugetlbfs
and wanted to warn about it. That assumption isn't true anymore.

> 
> | So I don't think we should drop this warning completely.
> | Either let's add the nvdimm magic, or simply check the
> | page size.
> 
> A:
> | Check the page size sounds good, will check:
> | if (pagesize != getpagesize()) {
> |...print something...
> |}
> 
> | I agree with you that showing the info is needed, however,
> | 'Warning' might scare some users, how about drop this word or
> | just show “Memory is not allocated from HugeTlbfs”?

With "warning:", I know it may be OK to do what I am doing and the
software is just trying to warn me. If there's no "warning:", I don't
even know if something is really broken in my config, or if it's just a
warning, and I would be very confused.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common

2015-11-04 Thread Paolo Bonzini
They are exactly the same, except that handle_mmio_page_fault
has an unused argument and a call to WARN_ON.  Remove the unused
argument from the callers, and move the warning to (the former)
handle_mmio_page_fault_common.
---
 arch/x86/kvm/mmu.c | 20 +---
 arch/x86/kvm/mmu.h |  6 +++---
 arch/x86/kvm/paging_tmpl.h |  3 +--
 arch/x86/kvm/vmx.c |  2 +-
 4 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7d85bcae3332..e7c2c1428a69 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3359,7 +3359,7 @@ exit:
return reserved;
 }
 
-int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 {
u64 spte;
bool reserved;
@@ -3368,7 +3368,7 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, 
u64 addr, bool direct)
return RET_MMIO_PF_EMULATE;
 
reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, &spte);
-   if (unlikely(reserved))
+   if (WARN_ON(reserved))
return RET_MMIO_PF_BUG;
 
if (is_mmio_spte(spte)) {
@@ -3392,17 +3392,7 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, 
u64 addr, bool direct)
 */
return RET_MMIO_PF_RETRY;
 }
-EXPORT_SYMBOL_GPL(handle_mmio_page_fault_common);
-
-static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr,
- u32 error_code, bool direct)
-{
-   int ret;
-
-   ret = handle_mmio_page_fault_common(vcpu, addr, direct);
-   WARN_ON(ret == RET_MMIO_PF_BUG);
-   return ret;
-}
+EXPORT_SYMBOL_GPL(handle_mmio_page_fault);
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code, bool prefault)
@@ -3413,7 +3403,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
 
if (unlikely(error_code & PFERR_RSVD_MASK)) {
-   r = handle_mmio_page_fault(vcpu, gva, error_code, true);
+   r = handle_mmio_page_fault(vcpu, gva, true);
 
if (likely(r != RET_MMIO_PF_INVALID))
return r;
@@ -3503,7 +3493,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
 
if (unlikely(error_code & PFERR_RSVD_MASK)) {
-   r = handle_mmio_page_fault(vcpu, gpa, error_code, true);
+   r = handle_mmio_page_fault(vcpu, gpa, true);
 
if (likely(r != RET_MMIO_PF_INVALID))
return r;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e4202e41d535..55ffb7b0f95e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -56,13 +56,13 @@ void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 /*
- * Return values of handle_mmio_page_fault_common:
+ * Return values of handle_mmio_page_fault:
  * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
  * directly.
  * RET_MMIO_PF_INVALID: invalid spte is detected then let the real page
  * fault path update the mmio spte.
  * RET_MMIO_PF_RETRY: let CPU fault again on the address.
- * RET_MMIO_PF_BUG: bug is detected.
+ * RET_MMIO_PF_BUG: a bug was detected (and a WARN was printed).
  */
 enum {
RET_MMIO_PF_EMULATE = 1,
@@ -71,7 +71,7 @@ enum {
RET_MMIO_PF_BUG = -1
 };
 
-int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
+int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct);
 void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly);
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b41faa91a6f9..3058a22a658d 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -705,8 +705,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 
if (unlikely(error_code & PFERR_RSVD_MASK)) {
-   r = handle_mmio_page_fault(vcpu, addr, error_code,
- mmu_is_nested(vcpu));
+   r = handle_mmio_page_fault(vcpu, addr, mmu_is_nested(vcpu));
if (likely(r != RET_MMIO_PF_INVALID))
return r;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 89f4fa25b06d..04007a032004 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5908,7 +5908,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
return 1;
}
 
-   ret = handle_mmio_page_fault_common(vcpu, gpa, true);
+   ret = handle_mmio_page_fault(vcpu, gpa, true);
if (likely(ret == RET_MMI

Re: [PATCH] kvm: irqchip: fix memory leak

2015-11-04 Thread Paolo Bonzini


On 03/11/2015 22:17, William Dauchy wrote:
> Hi Paolo,
> 
> I was wondering it this could be a valid candidate for -stable, don't you 
> think?
> (commit ba60c41)

Indeed, feel free to propose it!

Paolo

> Best regards,
> 
> On Sep02 12:33, Sudip Mukherjee wrote:
>> We were taking the exit path after checking ue->flags and return value
>> of setup_routing_entry(), but 'e' was not freed incase of a failure.
>>
>> Signed-off-by: Sudip Mukherjee 
>> ---
>>  virt/kvm/irqchip.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
>> index 21c1424..c63e54f 100644
>> --- a/virt/kvm/irqchip.c
>> +++ b/virt/kvm/irqchip.c
>> @@ -213,11 +213,15 @@ int kvm_set_irq_routing(struct kvm *kvm,
>>  goto out;
>>  
>>  r = -EINVAL;
>> -if (ue->flags)
>> +if (ue->flags) {
>> +kfree(e);
>>  goto out;
>> +}
>>  r = setup_routing_entry(new, e, ue);
>> -if (r)
>> +if (r) {
>> +kfree(e);
>>  goto out;
>> +}
>>  ++ue;
>>  }
>>  
>> -- 
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vfio: Fix bug in vfio_device_get_from_name()

2015-11-04 Thread Joerg Roedel
From: Joerg Roedel 

The vfio_device_get_from_name() function might return a
non-NULL pointer, when called with a device name that is not
found in the list. This causes undefined behavior, in my
case calling an invalid function pointer later on:

 kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
 BUG: unable to handle kernel paging request at 8800cb3ddc08

[...]

 Call Trace:
  [] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio]
  [] do_vfs_ioctl+0x2cd/0x4c0
  [] ? __fget+0x77/0xb0
  [] SyS_ioctl+0x79/0x90
  [] ? syscall_return_slowpath+0x50/0x130
  [] entry_SYSCALL_64_fastpath+0x16/0x75

Fix the issue by returning NULL when there is no device with
the requested name in the list.

Cc: sta...@vger.kernel.org # v4.2+
Fixes: 4bc94d5dc95d ("vfio: Fix lockdep issue")
Signed-off-by: Joerg Roedel 
---
 drivers/vfio/vfio.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 563c510..8c50ea6 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -692,11 +692,12 @@ EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
 static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 char *buf)
 {
-   struct vfio_device *device;
+   struct vfio_device *it, *device = NULL;
 
mutex_lock(&group->device_lock);
-   list_for_each_entry(device, &group->device_list, group_next) {
-   if (!strcmp(dev_name(device->dev), buf)) {
+   list_for_each_entry(it, &group->device_list, group_next) {
+   if (!strcmp(dev_name(it->dev), buf)) {
+   device = it;
vfio_device_get(device);
break;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: irqchip: fix memory leak

2015-11-04 Thread Paolo Bonzini


On 03/11/2015 22:17, William Dauchy wrote:
> Hi Paolo,
> 
> I was wondering it this could be a valid candidate for -stable, don't you 
> think?
> (commit ba60c41)

Certainly, feel free to propose it to sta...@vger.kernel.org!

Paolo

> Best regards,
> 
> On Sep02 12:33, Sudip Mukherjee wrote:
>> We were taking the exit path after checking ue->flags and return value
>> of setup_routing_entry(), but 'e' was not freed incase of a failure.
>>
>> Signed-off-by: Sudip Mukherjee 
>> ---
>>  virt/kvm/irqchip.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
>> index 21c1424..c63e54f 100644
>> --- a/virt/kvm/irqchip.c
>> +++ b/virt/kvm/irqchip.c
>> @@ -213,11 +213,15 @@ int kvm_set_irq_routing(struct kvm *kvm,
>>  goto out;
>>  
>>  r = -EINVAL;
>> -if (ue->flags)
>> +if (ue->flags) {
>> +kfree(e);
>>  goto out;
>> +}
>>  r = setup_routing_entry(new, e, ue);
>> -if (r)
>> +if (r) {
>> +kfree(e);
>>  goto out;
>> +}
>>  ++ue;
>>  }
>>  
>> -- 
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm: irqchip: fix memory leak in -stable

2015-11-04 Thread William Dauchy
Hello stable release team,

The commit ba60c41 kvm: irqchip: fix memory leak
is fixing commit e73f61e kvm: irqchip: Break up high order allocations of 
kvm_irq_routing_table

I believe commit ba60c41 kvm: irqchip: fix memory leak
is a good candidate for -stable. I also  got an agreement from Paolo.

Best regards,
-- 
William


signature.asc
Description: PGP signature


[patch] vfio: make an array larger

2015-11-04 Thread Dan Carpenter
Smatch complains about a possible out of bounds error:

drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init()
error: buffer overflow 'pci_cap_length' 20 <= 20

Fix this by making the array larger.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index ff75ca3..001d48a 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -46,7 +46,7 @@
  *   0: Removed from the user visible capability list
  *   FF: Variable length
  */
-static u8 pci_cap_length[] = {
+static u8 pci_cap_length[PCI_CAP_ID_MAX + 1] = {
[PCI_CAP_ID_BASIC]  = PCI_STD_HEADER_SIZEOF, /* pci config header */
[PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
[PCI_CAP_ID_AGP]= PCI_AGP_SIZEOF,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-04 Thread Christoffer Dall
On Tue, Nov 03, 2015 at 01:39:44PM -0600, Rob Herring wrote:
> On Tue, Nov 3, 2015 at 1:17 PM, Mario Smarduch  wrote:
> >
> >
> > On 11/3/2015 9:55 AM, Will Deacon wrote:
> >> On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
> >>> On 11/3/2015 8:33 AM, Christopher Covington wrote:
>  On 11/02/2015 06:51 PM, Mario Smarduch wrote:
> >this is a re-post from couple weeks ago, please take time to review 
> > this
> > simple patch which simplifies DEBUG_LL and prevents kernel crash on 
> > virtual
> > platforms.
> >
> > Before this patch DEBUG_LL for 'dummy virtual machine':
> >
> > ( ) Kernel low-level debugging via EmbeddedICE DCC channel
> > ( ) Kernel low-level debug output via semihosting I/O
> > ( ) Kernel low-level debugging via 8250 UART
> > ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell
> >
> > In summary if debug uart is not emulated kernel crashes.
> > And once you pass that hurdle, uart physical/virtual addresses are 
> > unknown.
> > DEBUG_LL comes in handy on many occasions and should be somewhat
> > intuitive to use like it is for physical platforms. For virtual 
> > platforms
> > user may start daubting the host and get into a bigger mess.
> >
> > After this patch is applied user gets:
> >
> > (X) Kernel low-level debugging on QEMU Virtual Platform
> > ( ) Kernel low-level debugging on Kvmtool Virtual Platform
> >. above repeated 
> >
> > The virtual addresses selected follow arm reference models, high in 
> > vmalloc
> > section with high mem enabled and guest running with >= 1GB of memory. 
> > The
> > offset is leftover from arm reference models.
> 
>  Which model? It doesn't appear to match the vexpress 
>  AEM/RTSM/FVP/whatever
>  which used 0x1c09 for UART0.
> >>>
> >>> I recall QEMU virt model had it's own physical address map, for sure I 
> >>> saw the
> >>> virtio-mmio regions assigned in some ARM document. Peter would you know?
> >>>
> >>> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre 
> >>> will that
> >>> stay fixed?
> >>
> >> We make absolutely no guarantees about the memory map provided by kvmtool.
> >>
> >> Will
> >>
> >
> > If that's also the case for qemu, then I guess the best you can do is find 
> > a way
> > to dump the device tree. Find the uart, physical address and try figure out 
> > the
> > virtual address.
> >
> > Pretty involved, hoped for something more automated since that's a handy 
> > feature.
> 
> You really only need LL_DEBUG now if you are debugging very early code
> before memory is setup and/or bad memory. Use earlycon instead which
> should already be supported both via the pl011 or semihosting. I used
> it with QEMU semihosting support.
> 
Then we should really document how to use that with qemu's virt platform
and kvmtool's platform on both 32-bit and 64-bit so that users can
easily figure out what they're doing wrong when they get no output.

In practice, the address for the pl011 is quite unlikely to change, I
dare speculate, so that documentation shouldn't need frequent updating.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices

2015-11-04 Thread Xiao Guangrong



On 11/04/2015 04:56 PM, Igor Mammedov wrote:

On Tue, 3 Nov 2015 22:22:40 +0800
Xiao Guangrong  wrote:




On 11/03/2015 09:13 PM, Igor Mammedov wrote:

On Mon,  2 Nov 2015 17:13:29 +0800
Xiao Guangrong  wrote:


NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

There is a root device under \_SB and specified NVDIMM devices are under the
root device. Each NVDIMM device has _ADR which returns its handle used to
associate MEMDEV structure in NFIT

We reserve handle 0 for root device. In this patch, we save handle, handle,
arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch

Signed-off-by: Xiao Guangrong 
---
   hw/acpi/nvdimm.c | 184 
+++
   1 file changed, 184 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index dd84e5f..53ed675 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray 
*table_offsets,
   g_array_free(structures, true);
   }

+struct NvdimmDsmIn {
+uint32_t handle;
+uint32_t revision;
+uint32_t function;
+   /* the remaining size in the page is used by arg3. */
+uint8_t arg3[0];
+} QEMU_PACKED;
+typedef struct NvdimmDsmIn NvdimmDsmIn;
+
   static uint64_t
   nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
   {
@@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
   static void
   nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
   {
+fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");

it doesn't seem like this hunk belongs here


Er, we have changed the logic:
- others:
1) the buffer length is directly got from IO read rather than got
   from dsm memory
[ This has documented in v5's changelog. ]

So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
triggered.




   }

   static const MemoryRegionOps nvdimm_dsm_ops = {
@@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, 
MemoryRegion *io,
   memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
   }

+#define BUILD_STA_METHOD(_dev_, _method_)  \
+do {   \
+_method_ = aml_method("_STA", 0);  \
+aml_append(_method_, aml_return(aml_int(0x0f)));   \
+aml_append(_dev_, _method_);   \
+} while (0)

_STA doesn't have any logic here so drop macro and just
replace its call sites with:


Okay, I was just wanting to save some code lines. I will drop this macro.



aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));


_STA is required as a method with zero argument but this statement just
define a object. It is okay?

Spec doesn't say that it must be method, it says that it will evaluate _STA 
object
and result must be a combination of defined flags.
AML wise calling a method with 0 arguments and referencing named variable
is the same thing, both end up being just a namestring.


I just tested it, it works.



Also note that _STA here return 0xF, and spec says that if _STA is missing
OSPM shall assume its implicit value being 0xF, so you can just drop _STA
object here altogether.


Actually, it will be needed for NVDIMM hotplug, but it is okay to me
to drop it at present. Let's introduce it when we implement hotplug.









+
+#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)\
+do {   \
+Aml *ifctx, *uuid; \
+_method_ = aml_method("_DSM", 4);  \
+/* check UUID if it is we expect, return the errorcode if not.*/   \
+uuid = aml_touuid(_uuid_); \
+ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid))); \
+aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */))); \
+aml_append(method, ifctx); \
+aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
+   aml_arg(1), aml_arg(2), aml_arg(3;  \
+aml_append(_dev_, _method_);   \
+} while (0)
+
+#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_) \
+aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
+
+#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_) \
+BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
+
+static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
+{
+for (; device_list; device_list = device_list->next) {
+NVDIMMDevice *nvdimm = device_list->data;
+int slot = object_property_get_int(OBJECT(nv

Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path

2015-11-04 Thread Xiao Guangrong



On 11/04/2015 08:40 PM, Eduardo Habkost wrote:

On Wed, Nov 04, 2015 at 11:12:41AM +0800, Xiao Guangrong wrote:

On 11/04/2015 07:00 AM, Eduardo Habkost wrote:

On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:

Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
locates at DAX enabled filesystem

So this patch let it work on any kind of path

Signed-off-by: Xiao Guangrong 
---
  exec.c | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9de38be..9075f4d 100644
--- a/exec.c
+++ b/exec.c
@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
  char *c;
  void *area;
  int fd;
-uint64_t hpagesize;
+uint64_t pagesize;
  Error *local_err = NULL;

-hpagesize = qemu_file_get_page_size(path, &local_err);
+pagesize = qemu_file_get_page_size(path, &local_err);
  if (local_err) {
  error_propagate(errp, local_err);
  goto error;
  }

-if (hpagesize == getpagesize()) {
-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+if (pagesize == getpagesize()) {
+fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");


If the point of this patch is to allow file_ram_alloc() to not be
specific to hugetlbfs anymore, this warning can simply go away.

(And in case if you really want to keep the warning, I don't see the
point of the changes you made to it.)



This is the history why we did it like this:
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html


The rule I am trying to follow is simple: if there are valid use cases
(e.g. nvdimm, tmpfs) where the warning will be printed every single time
QEMU runs, the warning is not appropriate.

If you really want to keep a warning, please make it not be printed on
all other valid use cases (nvdimm and tmpfs). Personally, I don't think
adding those additional checks would be worth the trouble, that's why I
suggest removing the warning.



Q:
| What this *actually* is trying to warn against is that
| mapping a regular file (as opposed to hugetlbfs)
| means transparent huge pages don't work.


I don't think the author of that warning even thought about transparent
huge pages (did THP even existed when it was written?). I believe they
just assumed that the only reason for using -mem-path would be hugetlbfs
and wanted to warn about it. That assumption isn't true anymore.


Michael, your idea?

If Michael will not beat me, i will drop this. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/3] KVM: arm/arm64: Clean up some obsolete code

2015-11-04 Thread Christoffer Dall
On Tue, Nov 03, 2015 at 12:44:54PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> >  By this time i'll make a very minimal version of patch 0001, for you to 
> > test it. If we have
> > problems with current 0001, which we
> > cannot solve quickly, we could stick to that version then, which will 
> > provide the necessary
> > changes to plug in LPIs, yet with
> > minimal changes (it will only remove vgic_irq_lr_map).
> >  I guess i should have done it before. Or, i could even respin v5, with 
> > current 0001 split up.
> > This should make it easier to bisect
> > the problem.
> 
>  So, i have just sent v5, conditions are the same as before. It is OK to stop 
> at any point, and actually you should be able to
> easily throw away 0003 and apply just 1, 2, 4. The minimum needed thing for 
> LPIs introduction is 0001.
>  You can also stick to v4 if the problem does not get triggered by its first 
> patch, if you prefer reduced commit log.
> 
Actually, I seem to have been just incredibly unlucky with my test
cycles, because I eventually reproduced the bug without your patches.

I'm going to take this version of the series because that's what I
reviewed and tested.

Sorry for the noise.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/4] dma ops and virtio

2015-11-04 Thread Cornelia Huck
On Tue, 3 Nov 2015 10:45:12 -0800
Andy Lutomirski  wrote:

> On Tue, Nov 3, 2015 at 9:59 AM, Cornelia Huck  
> wrote:
> > It's just failing very early in the setup phase. As it works for me
> > with a kvm setup, I'm suspecting some error in qemu's emulation code,
> > which is unfortunately not my turf.
> >
> 
> That should be easy to rule out.  Can you try with -machine accel=tcg?
>  I can't test with kvm for obvious reasons.

Well, s390-on-s390 works (at least with
https://patchwork.ozlabs.org/patch/538882/ applied).

I'm currently suspecting some endianness issues, probably with the ecw
accesses, which is why I'd be interested in your trace information (as
I currently don't have a LE test setup at hand.)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()

2015-11-04 Thread Eduardo Habkost
On Wed, Nov 04, 2015 at 11:17:09AM +0800, Xiao Guangrong wrote:
> 
> 
> On 11/04/2015 07:21 AM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> >[...]
> >>+size_t qemu_file_getlength(const char *file, Error **errp)
> >>+{
> >>+int64_t size;
> >[...]
> >>+return size;
> >
> >Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
> >by QEMU?
> >
> 
> Actually, this function is abstracted from the common function, 
> raw_getlength(),
> in raw-posix.c whose return value is int64_t.
> 
> And i think int64_t is large enough for block devices.

int64_t should be enough, but I don't know if size_t is large enough on
all platforms.

I believe it's going to be either one of those cases:

* If you are absolutely sure SIZE_MAX >= INT64_MAX on all platforms,
  please explain why (and maybe add a QEMU_BUILD_BUG_ON?). (I don't
  think this will be the case)
* If SIZE_MAX < INT64_MAX is possible but you believe
  size <= SIZE_MAX is always true here, please explain why (and maybe
  add an assert()).
* Otherwise, we need to set an appropriate error if size > SIZE_MAX
  or change the type of qemu_file_getlength(). What about making it
  uint64_t?

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()

2015-11-04 Thread Xiao Guangrong



On 11/04/2015 10:44 PM, Eduardo Habkost wrote:

On Wed, Nov 04, 2015 at 11:17:09AM +0800, Xiao Guangrong wrote:



On 11/04/2015 07:21 AM, Eduardo Habkost wrote:

On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
[...]

+size_t qemu_file_getlength(const char *file, Error **errp)
+{
+int64_t size;

[...]

+return size;


Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
by QEMU?



Actually, this function is abstracted from the common function, raw_getlength(),
in raw-posix.c whose return value is int64_t.

And i think int64_t is large enough for block devices.


int64_t should be enough, but I don't know if size_t is large enough on
all platforms.

I believe it's going to be either one of those cases:

* If you are absolutely sure SIZE_MAX >= INT64_MAX on all platforms,
   please explain why (and maybe add a QEMU_BUILD_BUG_ON?). (I don't
   think this will be the case)
* If SIZE_MAX < INT64_MAX is possible but you believe
   size <= SIZE_MAX is always true here, please explain why (and maybe
   add an assert()).
* Otherwise, we need to set an appropriate error if size > SIZE_MAX
   or change the type of qemu_file_getlength(). What about making it
   uint64_t?



It sounds better, I will change the return value from size_t to uint64_t.

Thank you for pointing it out, Eduardo!



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 08/21] arm/arm64: KVM: Support edge-triggered forwarded interrupts

2015-11-04 Thread Christoffer Dall
We mark edge-triggered interrupts with the HW bit set as queued to
prevent the VGIC code from injecting LRs with both the Active and
Pending bits set at the same time while also setting the HW bit,
because the hardware does not support this.

However, this means that we must also clear the queued flag when we sync
back a LR where the state on the physical distributor went from active
to inactive because the guest deactivated the interrupt.  At this point
we must also check if the interrupt is pending on the distributor, and
tell the VGIC to queue it again if it is.

Since these actions on the sync path are extremely close to those for
level-triggered interrupts, rename process_level_irq to
process_queued_irq, allowing it to cater for both cases.

Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 3c2909c..84abc6f 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1313,13 +1313,10 @@ epilog:
}
 }
 
-static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+static int process_queued_irq(struct kvm_vcpu *vcpu,
+  int lr, struct vgic_lr vlr)
 {
-   int level_pending = 0;
-
-   vlr.state = 0;
-   vlr.hwirq = 0;
-   vgic_set_lr(vcpu, lr, vlr);
+   int pending = 0;
 
/*
 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
@@ -1335,26 +1332,35 @@ static int process_level_irq(struct kvm_vcpu *vcpu, int 
lr, struct vgic_lr vlr)
vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
 
/*
-* Tell the gic to start sampling the line of this interrupt again.
+* Tell the gic to start sampling this interrupt again.
 */
vgic_irq_clear_queued(vcpu, vlr.irq);
 
/* Any additional pending interrupt? */
-   if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
-   vgic_cpu_irq_set(vcpu, vlr.irq);
-   level_pending = 1;
+   if (vgic_irq_is_edge(vcpu, vlr.irq)) {
+   BUG_ON(!(vlr.state & LR_HW));
+   pending = vgic_dist_irq_is_pending(vcpu, vlr.irq);
} else {
-   vgic_dist_irq_clear_pending(vcpu, vlr.irq);
-   vgic_cpu_irq_clear(vcpu, vlr.irq);
+   if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+   vgic_cpu_irq_set(vcpu, vlr.irq);
+   pending = 1;
+   } else {
+   vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+   vgic_cpu_irq_clear(vcpu, vlr.irq);
+   }
}
 
/*
 * Despite being EOIed, the LR may not have
 * been marked as empty.
 */
+   vlr.state = 0;
+   vlr.hwirq = 0;
+   vgic_set_lr(vcpu, lr, vlr);
+
vgic_sync_lr_elrsr(vcpu, lr, vlr);
 
-   return level_pending;
+   return pending;
 }
 
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
@@ -1391,7 +1397,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
*vcpu)
 vlr.irq - VGIC_NR_PRIVATE_IRQS);
 
spin_lock(&dist->lock);
-   level_pending |= process_level_irq(vcpu, lr, vlr);
+   level_pending |= process_queued_irq(vcpu, lr, vlr);
spin_unlock(&dist->lock);
}
}
@@ -1413,7 +1419,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
*vcpu)
 /*
  * Save the physical active state, and reset it to inactive.
  *
- * Return true if there's a pending level triggered interrupt line to queue.
+ * Return true if there's a pending forwarded interrupt to queue.
  */
 static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 {
@@ -1438,10 +1444,8 @@ static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int 
lr, struct vgic_lr vlr)
if (phys_active)
return 0;
 
-   /* Mapped edge-triggered interrupts not yet supported. */
-   WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
spin_lock(&dist->lock);
-   level_pending = process_level_irq(vcpu, lr, vlr);
+   level_pending = process_queued_irq(vcpu, lr, vlr);
spin_unlock(&dist->lock);
return level_pending;
 }
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 02/21] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block

2015-11-04 Thread Christoffer Dall
We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest.  This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.

Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.

The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running.  The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().

Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.

Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.

Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.

Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h   |  3 --
 arch/arm/kvm/arm.c| 10 +
 arch/arm64/include/asm/kvm_host.h |  3 --
 include/kvm/arm_arch_timer.h  |  2 +
 virt/kvm/arm/arch_timer.c | 94 +--
 5 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 84da979..c4072d9 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -234,7 +234,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 78b2869..7ed4d47 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
return kvm_timer_should_fire(vcpu);
 }
 
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+   kvm_timer_schedule(vcpu);
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+   kvm_timer_unschedule(vcpu);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
/* Force users to call KVM_ARM_VCPU_INIT */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e4f4d65..ed03968 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -255,7 +255,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e1e4d7c..ef14cc1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
+void kvm_timer_schedule(struct kvm_vcpu *vcpu);
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index b9d3a32..32095fb 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct 
hrtimer *hrt)
return HRTIMER_NORESTART;
 }
 
+static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
+{
+   struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+   return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
+   (timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
+   !kvm_vgic_get_phys_irq_active(timer->map);
+}
+
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 {
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
cycle_t cval, now;
 
-   if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
-   !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
-   kvm_vgic_get_phys_irq_active(timer->map))
+   if (!kvm_timer_irq_can_fire(vcpu))
return false;
 
cval = timer->cntv_cval;
@@ -127,12 +134,57 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
return cval <= now;
 }
 
+/*
+ * Schedule the background timer before calling kvm_vcpu_block, so that this

[PULL 00/21] KVM/ARM Changes for v4.4-rc1

2015-11-04 Thread Christoffer Dall
Hi Paolo,

Here is the set of changes for v4.4.  Some of the commits listed here were
already merged as fixes for v4.3, but since they are not in kvm/next yet, they
show up here.  Let me know if you want me to handle this differently somehow.

As usual, there's a lot of churn in the vgic and timer code, and then there are
number of smaller tweaks and adjustments.  Nothing major this time around.  For
a detailed description, see below.

The following changes since commit 920552b213e3dc832a874b4e7ba29ecddbab31bc:

  KVM: disable halt_poll_ns as default for s390x (2015-09-25 10:31:30 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvm-arm-for-4.4

for you to fetch changes up to 26caea7693cb99833fe4ecc544c842289d6b3f69:

  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() (2015-11-04 
15:29:49 +0100)


Thanks,
-Christoffer


KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Christoffer Dall (10):
  KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
  arm/arm64: KVM: Use appropriate define in VGIC reset code
  arm/arm64: KVM: Add forwarded physical interrupts documentation
  arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  arm/arm64: KVM: Support edge-triggered forwarded interrupts
  arm/arm64: KVM: Improve kvm_exit tracepoint
  arm/arm64: KVM: Add tracepoints for vgic and timer

Eric Auger (4):
  KVM: arm/arm64: rename pause into power_off
  KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnable
  KVM: arm/arm64: check power_off in critical section before VCPU run
  KVM: arm/arm64: implement kvm_arm_[halt,resume]_guest

Mark Rutland (1):
  arm64: kvm: restore EL1N SP for panic

Michal Marek (1):
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP

Pavel Fedin (4):
  KVM: arm/arm64: Fix vGIC documentation
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()

Wei Huang (1):
  arm/arm64: KVM : Enable vhost device selection under KVM config menu

 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 187 +
 Documentation/virtual/kvm/devices/arm-vgic.txt |  18 +-
 arch/arm/include/asm/kvm_arm.h |  20 ++
 arch/arm/include/asm/kvm_host.h|   5 +-
 arch/arm/kvm/Kconfig   |   2 +
 arch/arm/kvm/arm.c |  76 +++--
 arch/arm/kvm/psci.c|  10 +-
 arch/arm/kvm/trace.h   |  10 +-
 arch/arm64/include/asm/kvm_arm.h   |  16 ++
 arch/arm64/include/asm/kvm_host.h  |   5 +-
 arch/arm64/kvm/Kconfig |   2 +
 arch/arm64/kvm/hyp.S   |   8 +
 arch/mips/include/asm/kvm_host.h   |   2 +
 arch/powerpc/include/asm/kvm_host.h|   2 +
 arch/s390/include/asm/kvm_host.h   |   2 +
 arch/x86/include/asm/kvm_host.h|   3 +
 include/kvm/arm_arch_timer.h   |   4 +-
 include/kvm/arm_vgic.h |  16 +-
 include/linux/kvm_host.h   |   2 +
 virt/kvm/arm/arch_timer.c  | 173 
 virt/kvm/arm/trace.h   |  63 +
 virt/kvm/arm/vgic-v2.c |   6 +-
 virt/kvm/arm/vgic-v3.c |   6 +-
 virt/kvm/arm/vgic.c| 308 -
 virt/kvm/kvm_main.c|   3 +
 25 files changed, 646 insertions(+), 303 deletions(-)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
 create mode 100644 virt/kvm/arm/trace.h

-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 01/21] KVM: Add kvm_arch_vcpu_{un}blocking callbacks

2015-11-04 Thread Christoffer Dall
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h | 3 +++
 arch/arm64/include/asm/kvm_host.h   | 3 +++
 arch/mips/include/asm/kvm_host.h| 2 ++
 arch/powerpc/include/asm/kvm_host.h | 2 ++
 arch/s390/include/asm/kvm_host.h| 2 ++
 arch/x86/include/asm/kvm_host.h | 3 +++
 include/linux/kvm_host.h| 2 ++
 virt/kvm/kvm_main.c | 3 +++
 8 files changed, 20 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c4072d9..84da979 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -234,4 +234,7 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index ed03968..e4f4d65 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -255,4 +255,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 5a1a882..6ded8d3 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -847,5 +847,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm 
*kvm,
struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 827a38d..c9f122d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -718,5 +718,7 @@ static inline void kvm_arch_memslots_updated(struct kvm 
*kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_exit(void) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8ced426..72a614c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -644,5 +644,7 @@ static inline void kvm_arch_memslots_updated(struct kvm 
*kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2beee03..b28f0f1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1233,4 +1233,7 @@ int x86_set_memory_region(struct kvm *kvm,
 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1bef9e2..4a86f5f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -625,6 +625,8 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, 
const void *data,
 void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu

[PULL 14/21] KVM: arm/arm64: Fix vGIC documentation

2015-11-04 Thread Christoffer Dall
From: Pavel Fedin 

Correct some old mistakes in the API documentation:

1. VCPU is identified by index (using kvm_get_vcpu() function), but
   "cpu id" can be mistaken for affinity ID.
2. Some error codes are wrong.

  [ Slightly tweaked some grammer and did some s/CPU index/vcpu_index/
in the descriptions.  -Christoffer ]

Signed-off-by: Pavel Fedin 
Signed-off-by: Christoffer Dall 
---
 Documentation/virtual/kvm/devices/arm-vgic.txt | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
b/Documentation/virtual/kvm/devices/arm-vgic.txt
index 3fb9054..59541d4 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
@@ -44,28 +44,29 @@ Groups:
   Attributes:
 The attr field of kvm_device_attr encodes two values:
 bits: | 63     40 | 39 ..  32  |  31   0 |
-values:   |reserved   |   cpu id   |  offset |
+values:   |reserved   | vcpu_index |  offset |
 
 All distributor regs are (rw, 32-bit)
 
 The offset is relative to the "Distributor base address" as defined in the
 GICv2 specs.  Getting or setting such a register has the same effect as
-reading or writing the register on the actual hardware from the cpu
-specified with cpu id field.  Note that most distributor fields are not
-banked, but return the same value regardless of the cpu id used to access
-the register.
+reading or writing the register on the actual hardware from the cpu whose
+index is specified with the vcpu_index field.  Note that most distributor
+fields are not banked, but return the same value regardless of the
+vcpu_index used to access the register.
   Limitations:
 - Priorities are not implemented, and registers are RAZ/WI
 - Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
   Errors:
--ENODEV: Getting or setting this register is not yet supported
+-ENXIO: Getting or setting this register is not yet supported
 -EBUSY: One or more VCPUs are running
+-EINVAL: Invalid vcpu_index supplied
 
   KVM_DEV_ARM_VGIC_GRP_CPU_REGS
   Attributes:
 The attr field of kvm_device_attr encodes two values:
 bits: | 63     40 | 39 ..  32  |  31   0 |
-values:   |reserved   |   cpu id   |  offset |
+values:   |reserved   | vcpu_index |  offset |
 
 All CPU interface regs are (rw, 32-bit)
 
@@ -91,8 +92,9 @@ Groups:
 - Priorities are not implemented, and registers are RAZ/WI
 - Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
   Errors:
--ENODEV: Getting or setting this register is not yet supported
+-ENXIO: Getting or setting this register is not yet supported
 -EBUSY: One or more VCPUs are running
+-EINVAL: Invalid vcpu_index supplied
 
   KVM_DEV_ARM_VGIC_GRP_NR_IRQS
   Attributes:
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 12/21] KVM: arm/arm64: check power_off in critical section before VCPU run

2015-11-04 Thread Christoffer Dall
From: Eric Auger 

In case a vcpu off PSCI call is called just after we executed the
vcpu_sleep check, we can enter the guest although power_off
is set. Let's check the power_off state in the critical section,
just before entering the guest.

Signed-off-by: Eric Auger 
Reported-by: Christoffer Dall 
Reviewed-by: Christoffer Dall 
Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/arm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d04deeb..3b3384c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -560,7 +560,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
run->exit_reason = KVM_EXIT_INTR;
}
 
-   if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+   if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
+   vcpu->arch.power_off) {
local_irq_enable();
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 19/21] KVM: arm/arm64: Optimize away redundant LR tracking

2015-11-04 Thread Christoffer Dall
From: Pavel Fedin 

Currently we use vgic_irq_lr_map in order to track which LRs hold which
IRQs, and lr_used bitmap in order to track which LRs are used or free.

vgic_irq_lr_map is actually used only for piggy-back optimization, and
can be easily replaced by iteration over lr_used. This is good because in
future, when LPI support is introduced, number of IRQs will grow up to at
least 16384, while numbers from 1024 to 8192 are never going to be used.
This would be a huge memory waste.

In its turn, lr_used is also completely redundant since
ae705930fca6322600690df9dc1c7d0516145a93 ("arm/arm64: KVM: Keep elrsr/aisr
in sync with software model"), because together with lr_used we also update
elrsr. This allows to easily replace lr_used with elrsr, inverting all
conditions (because in elrsr '1' means 'free').

Signed-off-by: Pavel Fedin 
Signed-off-by: Christoffer Dall 
---
 include/kvm/arm_vgic.h |  6 --
 virt/kvm/arm/vgic-v2.c |  1 +
 virt/kvm/arm/vgic-v3.c |  1 +
 virt/kvm/arm/vgic.c| 53 ++
 4 files changed, 17 insertions(+), 44 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 8065801..3936bf8 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -295,9 +295,6 @@ struct vgic_v3_cpu_if {
 };
 
 struct vgic_cpu {
-   /* per IRQ to LR mapping */
-   u8  *vgic_irq_lr_map;
-
/* Pending/active/both interrupts on this VCPU */
DECLARE_BITMAP(pending_percpu, VGIC_NR_PRIVATE_IRQS);
DECLARE_BITMAP(active_percpu, VGIC_NR_PRIVATE_IRQS);
@@ -308,9 +305,6 @@ struct vgic_cpu {
unsigned long   *active_shared;
unsigned long   *pend_act_shared;
 
-   /* Bitmap of used/free list registers */
-   DECLARE_BITMAP(lr_used, VGIC_V2_MAX_LRS);
-
/* Number of list registers on this CPU */
int nr_lr;
 
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index 8d7b04d..c0f5d7f 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -158,6 +158,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
+   vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
/* Get the show on the road... */
vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 7dd5d62..92003cb 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -193,6 +193,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vgic_v3->vgic_vmcr = 0;
+   vgic_v3->vgic_elrsr = ~0;
 
/*
 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index d4669eb..265a410 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -108,6 +108,7 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu 
*vcpu);
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
+static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu);
 static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
int virt_irq);
 static int compute_pending_for_cpu(struct kvm_vcpu *vcpu);
@@ -691,9 +692,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
 void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask(&elrsr);
int i;
 
-   for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
+   for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
struct vgic_lr lr = vgic_get_lr(vcpu, i);
 
/*
@@ -1098,7 +1101,6 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
 
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
 {
-   struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 
/*
@@ -1112,8 +1114,6 @@ static void vgic_retire_lr(int lr_nr, int irq, struct 
kvm_vcpu *vcpu)
 
vlr.state = 0;
vgic_set_lr(vcpu, lr_nr, vlr);
-   clear_bit(lr_nr, vgic_cpu->lr_used);
-   vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY;
vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
 
@@ -1128,10 +1128,11 @@ static void vgic_retire_lr(int lr_nr, int irq, struct 
kvm_vcpu *vcpu)
  */
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
 {
-   struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask(&elrsr);
int lr;
 
-   for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) {
+   for_each_clear_bit(lr, elrsr_ptr

[PULL 18/21] KVM: arm: Do not indent the arguments of DECLARE_BITMAP

2015-11-04 Thread Christoffer Dall
From: Michal Marek 

Besides being a coding style issue, it confuses make tags:

ctags: Warning: include/kvm/arm_vgic.h:307: null expansion of name pattern "\1"
ctags: Warning: include/kvm/arm_vgic.h:308: null expansion of name pattern "\1"
ctags: Warning: include/kvm/arm_vgic.h:309: null expansion of name pattern "\1"
ctags: Warning: include/kvm/arm_vgic.h:317: null expansion of name pattern "\1"

Cc: kvm...@lists.cs.columbia.edu
Signed-off-by: Michal Marek 
Signed-off-by: Christoffer Dall 
---
 include/kvm/arm_vgic.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 7bc5d02..8065801 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -299,9 +299,9 @@ struct vgic_cpu {
u8  *vgic_irq_lr_map;
 
/* Pending/active/both interrupts on this VCPU */
-   DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
-   DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
-   DECLARE_BITMAP( pend_act_percpu, VGIC_NR_PRIVATE_IRQS);
+   DECLARE_BITMAP(pending_percpu, VGIC_NR_PRIVATE_IRQS);
+   DECLARE_BITMAP(active_percpu, VGIC_NR_PRIVATE_IRQS);
+   DECLARE_BITMAP(pend_act_percpu, VGIC_NR_PRIVATE_IRQS);
 
/* Pending/active/both shared interrupts, dynamically sized */
unsigned long   *pending_shared;
@@ -309,7 +309,7 @@ struct vgic_cpu {
unsigned long   *pend_act_shared;
 
/* Bitmap of used/free list registers */
-   DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
+   DECLARE_BITMAP(lr_used, VGIC_V2_MAX_LRS);
 
/* Number of list registers on this CPU */
int nr_lr;
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 06/21] arm/arm64: KVM: Add forwarded physical interrupts documentation

2015-11-04 Thread Christoffer Dall
Forwarded physical interrupts on arm/arm64 is a tricky concept and the
way we deal with them is not apparently easy to understand by reading
various specs.

Therefore, add a proper documentation file explaining the flow and
rationale of the behavior of the vgic.

Some of this text was contributed by Marc Zyngier and edited by me.
Omissions and errors are all mine.

Signed-off-by: Christoffer Dall 
---
 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 187 +
 1 file changed, 187 insertions(+)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt 
b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
new file mode 100644
index 000..38bca28
--- /dev/null
+++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
@@ -0,0 +1,187 @@
+KVM/ARM VGIC Forwarded Physical Interrupts
+==
+
+The KVM/ARM code implements software support for the ARM Generic
+Interrupt Controller's (GIC's) hardware support for virtualization by
+allowing software to inject virtual interrupts to a VM, which the guest
+OS sees as regular interrupts.  The code is famously known as the VGIC.
+
+Some of these virtual interrupts, however, correspond to physical
+interrupts from real physical devices.  One example could be the
+architected timer, which itself supports virtualization, and therefore
+lets a guest OS program the hardware device directly to raise an
+interrupt at some point in time.  When such an interrupt is raised, the
+host OS initially handles the interrupt and must somehow signal this
+event as a virtual interrupt to the guest.  Another example could be a
+passthrough device, where the physical interrupts are initially handled
+by the host, but the device driver for the device lives in the guest OS
+and KVM must therefore somehow inject a virtual interrupt on behalf of
+the physical one to the guest OS.
+
+These virtual interrupts corresponding to a physical interrupt on the
+host are called forwarded physical interrupts, but are also sometimes
+referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
+
+Forwarded physical interrupts are handled slightly differently compared
+to virtual interrupts generated purely by a software emulated device.
+
+
+The HW bit
+--
+Virtual interrupts are signalled to the guest by programming the List
+Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
+with the virtual IRQ number and the state of the interrupt (Pending,
+Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
+interrupt, the LR state moves from Pending to Active, and finally to
+inactive.
+
+The LRs include an extra bit, called the HW bit.  When this bit is set,
+KVM must also program an additional field in the LR, the physical IRQ
+number, to link the virtual with the physical IRQ.
+
+When the HW bit is set, KVM must EITHER set the Pending OR the Active
+bit, never both at the same time.
+
+Setting the HW bit causes the hardware to deactivate the physical
+interrupt on the physical distributor when the guest deactivates the
+corresponding virtual interrupt.
+
+
+Forwarded Physical Interrupts Life Cycle
+
+
+The state of forwarded physical interrupts is managed in the following way:
+
+  - The physical interrupt is acked by the host, and becomes active on
+the physical distributor (*).
+  - KVM sets the LR.Pending bit, because this is the only way the GICV
+interface is going to present it to the guest.
+  - LR.Pending will stay set as long as the guest has not acked the interrupt.
+  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
+expected.
+  - On guest EOI, the *physical distributor* active bit gets cleared,
+but the LR.Active is left untouched (set).
+  - KVM clears the LR on VM exits when the physical distributor
+active state has been cleared.
+
+(*): The host handling is slightly more complicated.  For some forwarded
+interrupts (shared), KVM directly sets the active state on the physical
+distributor before entering the guest, because the interrupt is never actually
+handled on the host (see details on the timer as an example below).  For other
+forwarded interrupts (non-shared) the host does not deactivate the interrupt
+when the host ISR completes, but leaves the interrupt active until the guest
+deactivates it.  Leaving the interrupt active is allowed, because Linux
+configures the physical GIC with EOIMode=1, which causes EOI operations to
+perform a priority drop allowing the GIC to receive other interrupts of the
+default priority.
+
+
+Forwarded Edge and Level Triggered PPIs and SPIs
+
+Forwarded physical interrupts injected should always be active on the
+physical distributor when injected to a guest.
+
+Level-triggered interrupts will keep the interrupt line to the GIC
+asserted, typ

[PULL 20/21] KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings

2015-11-04 Thread Christoffer Dall
From: Pavel Fedin 

1. Remove unnecessary 'irq' argument, because irq number can be retrieved
   from the LR.
2. Since cff9211eb1a1f58ce7f5a2d596b617928fd4be0e
   ("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ")
   LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it
   clears vlr.state itself. Therefore, we remove the same, now duplicated,
   check with all accompanying bit manipulations from vgic_unqueue_irqs().
3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since
   it already does more than just clearing the LR, move
   vgic_irq_clear_queued() inside of it.

Signed-off-by: Pavel Fedin 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 37 ++---
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 265a410..96e45f3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -105,7 +105,7 @@
 #include "vgic.h"
 
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
+static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu);
@@ -717,30 +717,14 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 * interrupt then move the active state to the
 * distributor tracking bit.
 */
-   if (lr.state & LR_STATE_ACTIVE) {
+   if (lr.state & LR_STATE_ACTIVE)
vgic_irq_set_active(vcpu, lr.irq);
-   lr.state &= ~LR_STATE_ACTIVE;
-   }
 
/*
 * Reestablish the pending state on the distributor and the
-* CPU interface.  It may have already been pending, but that
-* is fine, then we are only setting a few bits that were
-* already set.
+* CPU interface and mark the LR as free for other use.
 */
-   if (lr.state & LR_STATE_PENDING) {
-   vgic_dist_irq_set_pending(vcpu, lr.irq);
-   lr.state &= ~LR_STATE_PENDING;
-   }
-
-   vgic_set_lr(vcpu, i, lr);
-
-   /*
-* Mark the LR as free for other use.
-*/
-   BUG_ON(lr.state & LR_STATE_MASK);
-   vgic_retire_lr(i, lr.irq, vcpu);
-   vgic_irq_clear_queued(vcpu, lr.irq);
+   vgic_retire_lr(i, vcpu);
 
/* Finally update the VGIC state. */
vgic_update_state(vcpu->kvm);
@@ -1099,16 +1083,18 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
vgic_ops->enable(vcpu);
 }
 
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
+static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu)
 {
struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 
+   vgic_irq_clear_queued(vcpu, vlr.irq);
+
/*
 * We must transfer the pending state back to the distributor before
 * retiring the LR, otherwise we may loose edge-triggered interrupts.
 */
if (vlr.state & LR_STATE_PENDING) {
-   vgic_dist_irq_set_pending(vcpu, irq);
+   vgic_dist_irq_set_pending(vcpu, vlr.irq);
vlr.hwirq = 0;
}
 
@@ -1135,11 +1121,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu 
*vcpu)
for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) {
struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
 
-   if (!vgic_irq_is_enabled(vcpu, vlr.irq)) {
-   vgic_retire_lr(lr, vlr.irq, vcpu);
-   if (vgic_irq_is_queued(vcpu, vlr.irq))
-   vgic_irq_clear_queued(vcpu, vlr.irq);
-   }
+   if (!vgic_irq_is_enabled(vcpu, vlr.irq))
+   vgic_retire_lr(lr, vcpu);
}
 }
 
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 21/21] KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()

2015-11-04 Thread Christoffer Dall
From: Pavel Fedin 

Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used
together. Merge them into one function, saving from second vgic_ops
dereferencing every time.

Signed-off-by: Pavel Fedin 
Signed-off-by: Christoffer Dall 
---
 include/kvm/arm_vgic.h |  1 -
 virt/kvm/arm/vgic-v2.c |  5 -
 virt/kvm/arm/vgic-v3.c |  5 -
 virt/kvm/arm/vgic.c| 14 ++
 4 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 3936bf8..f62addc 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -112,7 +112,6 @@ struct vgic_vmcr {
 struct vgic_ops {
struct vgic_lr  (*get_lr)(const struct kvm_vcpu *, int);
void(*set_lr)(struct kvm_vcpu *, int, struct vgic_lr);
-   void(*sync_lr_elrsr)(struct kvm_vcpu *, int, struct vgic_lr);
u64 (*get_elrsr)(const struct kvm_vcpu *vcpu);
u64 (*get_eisr)(const struct kvm_vcpu *vcpu);
void(*clear_eisr)(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index c0f5d7f..ff02f08 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -79,11 +79,7 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT);
 
vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val;
-}
 
-static void vgic_v2_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
- struct vgic_lr lr_desc)
-{
if (!(lr_desc.state & LR_STATE_MASK))
vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr |= (1ULL << lr);
else
@@ -167,7 +163,6 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 static const struct vgic_ops vgic_v2_ops = {
.get_lr = vgic_v2_get_lr,
.set_lr = vgic_v2_set_lr,
-   .sync_lr_elrsr  = vgic_v2_sync_lr_elrsr,
.get_elrsr  = vgic_v2_get_elrsr,
.get_eisr   = vgic_v2_get_eisr,
.clear_eisr = vgic_v2_clear_eisr,
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 92003cb..487d635 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -112,11 +112,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
}
 
vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
-}
 
-static void vgic_v3_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
- struct vgic_lr lr_desc)
-{
if (!(lr_desc.state & LR_STATE_MASK))
vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr);
else
@@ -212,7 +208,6 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 static const struct vgic_ops vgic_v3_ops = {
.get_lr = vgic_v3_get_lr,
.set_lr = vgic_v3_set_lr,
-   .sync_lr_elrsr  = vgic_v3_sync_lr_elrsr,
.get_elrsr  = vgic_v3_get_elrsr,
.get_eisr   = vgic_v3_get_eisr,
.clear_eisr = vgic_v3_clear_eisr,
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 96e45f3..fe451d4 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1032,12 +1032,6 @@ static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr,
vgic_ops->set_lr(vcpu, lr, vlr);
 }
 
-static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
-  struct vgic_lr vlr)
-{
-   vgic_ops->sync_lr_elrsr(vcpu, lr, vlr);
-}
-
 static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
 {
return vgic_ops->get_elrsr(vcpu);
@@ -1100,7 +1094,6 @@ static void vgic_retire_lr(int lr_nr, struct kvm_vcpu 
*vcpu)
 
vlr.state = 0;
vgic_set_lr(vcpu, lr_nr, vlr);
-   vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
 
 /*
@@ -1162,7 +1155,6 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, 
int irq,
}
 
vgic_set_lr(vcpu, lr_nr, vlr);
-   vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
 
 /*
@@ -1340,8 +1332,6 @@ static int process_queued_irq(struct kvm_vcpu *vcpu,
vlr.hwirq = 0;
vgic_set_lr(vcpu, lr, vlr);
 
-   vgic_sync_lr_elrsr(vcpu, lr, vlr);
-
return pending;
 }
 
@@ -1442,8 +1432,6 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
bool level_pending;
 
level_pending = vgic_process_maintenance(vcpu);
-   elrsr = vgic_get_elrsr(vcpu);
-   elrsr_ptr = u64_to_bitmask(&elrsr);
 
/* Deal with HW interrupts, and clear mappings for empty LRs */
for (lr = 0; lr < vgic->nr_lr; lr++) {
@@ -1454,6 +1442,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
}
 
/* Check if we still have something up our sleeve... */
+   elrsr = vgic_get_elrsr(vcpu);
+   elrsr_ptr = u64_to_bitmask(&elrsr);
pending = find_first_zero_bit(elrsr_ptr, vgic->nr_lr);
if (level_pending || pending < vgic->nr_lr)
set_bit(vcpu->vcpu_id, dist->ir

[PULL 07/21] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics

2015-11-04 Thread Christoffer Dall
The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any effect on the pending state of
virtual interrupts in the vgic.  This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts.  Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.

This patch fixes this shortcoming through the following series of
changes.

First, we change the flow of the timer/vgic sync/flush operations.  Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output.  This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.

Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes.  Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.

Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever the input to the GIC is asserted and conversely clear the
physical active state when the input to the GIC is deasserted.

Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.

Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/arm.c   |  11 +++--
 include/kvm/arm_arch_timer.h |   2 +-
 include/kvm/arm_vgic.h   |   3 --
 virt/kvm/arm/arch_timer.c|  81 +++---
 virt/kvm/arm/vgic.c  | 102 +++
 5 files changed, 91 insertions(+), 108 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7ed4d47..59125f4 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
local_irq_enable();
+   kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
preempt_enable();
-   kvm_timer_sync_hwstate(vcpu);
continue;
}
 
@@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
kvm_guest_exit();
trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
+   /*
+* We must sync the timer state before the vgic state so that
+* the vgic can properly sample the updated state of the
+* interrupt line.
+*/
+   kvm_timer_sync_hwstate(vcpu);
+
kvm_vgic_sync_hwstate(vcpu);
 
preempt_enable();
 
-   kvm_timer_sync_hwstate(vcpu);
-
ret = handle_exit(vcpu, run, ret);
}
 
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ef14cc1..1800227 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -51,7 +51,7 @@ struct arch_timer_cpu {
boolarmed;
 
/* Timer IRQ */
-   const struct kvm_irq_level  *irq;
+   struct kvm_irq_levelirq;
 
/* VGIC mapping */
struct irq_phys_map *map;
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4e14dac..7bc5d02 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -159,7 +159,6 @@ struct irq_phys_map {
u32 virt_irq;
u32 phys_irq;
u32 irq;
-   boolactive;
 };
 
 struct irq_phys_map_entry {
@@ -354,8 +353,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
   int virt_irq, int irq);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
 #define irqchip_in_kernel(k)   (!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 32095fb..523816d 100644
--- a/virt/kv

[PULL 11/21] KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnable

2015-11-04 Thread Christoffer Dall
From: Eric Auger 

kvm_arch_vcpu_runnable now also checks whether the power_off
flag is set.

Signed-off-by: Eric Auger 
Reviewed-by: Christoffer Dall 
Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/arm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9d2fb47..d04deeb 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -352,7 +352,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
  */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-   return !!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v);
+   return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
+   && !v->arch.power_off);
 }
 
 /* Just ensure a guest exit from a particular CPU */
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 10/21] KVM: arm/arm64: rename pause into power_off

2015-11-04 Thread Christoffer Dall
From: Eric Auger 

The kvm_vcpu_arch pause field is renamed into power_off to prepare
for the introduction of a new pause field. Also vcpu_pause is renamed
into vcpu_sleep since we will sleep until both power_off and pause are
false.

Signed-off-by: Eric Auger 
Reviewed-by: Christoffer Dall 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h   |  4 ++--
 arch/arm/kvm/arm.c| 20 ++--
 arch/arm/kvm/psci.c   | 10 +-
 arch/arm64/include/asm/kvm_host.h |  4 ++--
 4 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c4072d9..107374f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -126,8 +126,8 @@ struct kvm_vcpu_arch {
 * here.
 */
 
-   /* Don't run the guest on this vcpu */
-   bool pause;
+   /* vcpu power-off state */
+   bool power_off;
 
/* IO related fields */
struct kvm_decode mmio_decode;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 59125f4..9d2fb47 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -318,7 +318,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
 {
-   if (vcpu->arch.pause)
+   if (vcpu->arch.power_off)
mp_state->mp_state = KVM_MP_STATE_STOPPED;
else
mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
@@ -331,10 +331,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 {
switch (mp_state->mp_state) {
case KVM_MP_STATE_RUNNABLE:
-   vcpu->arch.pause = false;
+   vcpu->arch.power_off = false;
break;
case KVM_MP_STATE_STOPPED:
-   vcpu->arch.pause = true;
+   vcpu->arch.power_off = true;
break;
default:
return -EINVAL;
@@ -478,11 +478,11 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
return vgic_initialized(kvm);
 }
 
-static void vcpu_pause(struct kvm_vcpu *vcpu)
+static void vcpu_sleep(struct kvm_vcpu *vcpu)
 {
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
 
-   wait_event_interruptible(*wq, !vcpu->arch.pause);
+   wait_event_interruptible(*wq, !vcpu->arch.power_off);
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -532,8 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu->kvm);
 
-   if (vcpu->arch.pause)
-   vcpu_pause(vcpu);
+   if (vcpu->arch.power_off)
+   vcpu_sleep(vcpu);
 
/*
 * Disarming the background timer must be done in a
@@ -780,12 +780,12 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu 
*vcpu,
vcpu_reset_hcr(vcpu);
 
/*
-* Handle the "start in power-off" case by marking the VCPU as paused.
+* Handle the "start in power-off" case.
 */
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
-   vcpu->arch.pause = true;
+   vcpu->arch.power_off = true;
else
-   vcpu->arch.pause = false;
+   vcpu->arch.power_off = false;
 
return 0;
 }
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index ad6f642..0b55696 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -63,7 +63,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu 
*vcpu)
 
 static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.pause = true;
+   vcpu->arch.power_off = true;
 }
 
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
@@ -87,7 +87,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
 */
if (!vcpu)
return PSCI_RET_INVALID_PARAMS;
-   if (!vcpu->arch.pause) {
+   if (!vcpu->arch.power_off) {
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
return PSCI_RET_ALREADY_ON;
else
@@ -115,7 +115,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
 * the general puspose registers are undefined upon CPU_ON.
 */
*vcpu_reg(vcpu, 0) = context_id;
-   vcpu->arch.pause = false;
+   vcpu->arch.power_off = false;
smp_mb();   /* Make sure the above is visible */
 
wq = kvm_arch_vcpu_wq(vcpu);
@@ -153,7 +153,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct 
kvm_vcpu *vcpu)
mpidr = kvm_vcpu_get_mpidr_aff(tmp);
if ((mpidr & target_affinity_mask) == target_affinity) {
matching_cpus++;
-   if (!tmp->arch.pause)
+   if (!tmp->arch.power_off)
   

[PULL 05/21] arm/arm64: KVM: Use appropriate define in VGIC reset code

2015-11-04 Thread Christoffer Dall
We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS.  Change this slightly confusing use of the
defines.

Note: This should have no functional change, as both names are defined
to the number 16.

Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f8ca2e9..a44ecf9 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2128,8 +2128,12 @@ int vgic_init(struct kvm *kvm)
break;
}
 
-   for (i = 0; i < dist->nr_irqs; i++) {
-   if (i < VGIC_NR_PPIS)
+   /*
+* Enable all SGIs and configure all private IRQs as
+* edge-triggered.
+*/
+   for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
+   if (i < VGIC_NR_SGIS)
vgic_bitmap_set_irq_val(&dist->irq_enabled,
vcpu->vcpu_id, i, 1);
if (i < VGIC_NR_PRIVATE_IRQS)
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 13/21] KVM: arm/arm64: implement kvm_arm_[halt,resume]_guest

2015-11-04 Thread Christoffer Dall
From: Eric Auger 

We introduce kvm_arm_halt_guest and resume functions. They
will be used for IRQ forward state change.

Halt is synchronous and prevents the guest from being re-entered.
We use the same mechanism put in place for PSCI former pause,
now renamed power_off. A new flag is introduced in arch vcpu state,
pause, only meant to be used by those functions.

Signed-off-by: Eric Auger 
Reviewed-by: Christoffer Dall 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h   |  3 +++
 arch/arm/kvm/arm.c| 35 +++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 107374f..6692982 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -129,6 +129,9 @@ struct kvm_vcpu_arch {
/* vcpu power-off state */
bool power_off;
 
+/* Don't run the guest (internal implementation need) */
+   bool pause;
+
/* IO related fields */
struct kvm_decode mmio_decode;
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3b3384c..ed15747 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -353,7 +353,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
-   && !v->arch.power_off);
+   && !v->arch.power_off && !v->arch.pause);
 }
 
 /* Just ensure a guest exit from a particular CPU */
@@ -479,11 +479,38 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
return vgic_initialized(kvm);
 }
 
+static void kvm_arm_halt_guest(struct kvm *kvm) __maybe_unused;
+static void kvm_arm_resume_guest(struct kvm *kvm) __maybe_unused;
+
+static void kvm_arm_halt_guest(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   vcpu->arch.pause = true;
+   force_vm_exit(cpu_all_mask);
+}
+
+static void kvm_arm_resume_guest(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+
+   vcpu->arch.pause = false;
+   wake_up_interruptible(wq);
+   }
+}
+
 static void vcpu_sleep(struct kvm_vcpu *vcpu)
 {
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
 
-   wait_event_interruptible(*wq, !vcpu->arch.power_off);
+   wait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
+  (!vcpu->arch.pause)));
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -533,7 +560,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu->kvm);
 
-   if (vcpu->arch.power_off)
+   if (vcpu->arch.power_off || vcpu->arch.pause)
vcpu_sleep(vcpu);
 
/*
@@ -561,7 +588,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
}
 
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
-   vcpu->arch.power_off) {
+   vcpu->arch.power_off || vcpu->arch.pause) {
local_irq_enable();
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 4b4157b..a35ce72 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -152,6 +152,9 @@ struct kvm_vcpu_arch {
/* vcpu power-off state */
bool power_off;
 
+   /* Don't run the guest (internal implementation need) */
+   bool pause;
+
/* IO related fields */
struct kvm_decode mmio_decode;
 
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 04/21] arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs

2015-11-04 Thread Christoffer Dall
The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.

This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 367a180..f8ca2e9 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -690,10 +690,9 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
vgic_reg_access(mmio, &val, offset,
ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
if (mmio->is_write) {
-   if (offset < 8) {
-   *reg = ~0U; /* Force PPIs/SGIs to 1 */
+   /* Ignore writes to read-only SGI and PPI bits */
+   if (offset < 8)
return false;
-   }
 
val = vgic_cfg_compress(val);
if (offset & 4) {
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 03/21] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit

2015-11-04 Thread Christoffer Dall
Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.

Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one.  In the following I
try to list the changes and why they are harmless:

  Move vgic_irq_clear_queued after kvm_notify_acked_irq:
Harmless because the only potential effect of clearing the queued
flag wrt.  kvm_set_irq is that vgic_update_irq_pending does not set
the pending bit on the emulated CPU interface or in the
pending_on_cpu bitmask if the function is called with level=1.
However, the point of kvm_notify_acked_irq is to call kvm_set_irq
with level=0, and we set the queued flag again in
__kvm_vgic_sync_hwstate later on if the level is stil high.

  Move vgic_set_lr before kvm_notify_acked_irq:
Also, harmless because the LR are cpu-local operations and
kvm_notify_acked only affects the dist

  Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
Also harmless, because now we check the level state in the
clear_soft_pend function and lower the pending bits if the level is
low.

Reviewed-by: Eric Auger 
Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 94 +++--
 1 file changed, 56 insertions(+), 38 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 66c6616..367a180 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -107,6 +107,7 @@ static struct vgic_lr vgic_get_lr(const struct kvm_vcpu 
*vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
int virt_irq);
+static int compute_pending_for_cpu(struct kvm_vcpu *vcpu);
 
 static const struct vgic_ops *vgic_ops;
 static const struct vgic_params *vgic;
@@ -357,6 +358,11 @@ static void vgic_dist_irq_clear_soft_pend(struct kvm_vcpu 
*vcpu, int irq)
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 
vgic_bitmap_set_irq_val(&dist->irq_soft_pend, vcpu->vcpu_id, irq, 0);
+   if (!vgic_dist_irq_get_level(vcpu, irq)) {
+   vgic_dist_irq_clear_pending(vcpu, irq);
+   if (!compute_pending_for_cpu(vcpu))
+   clear_bit(vcpu->vcpu_id, dist->irq_pending_on_cpu);
+   }
 }
 
 static int vgic_dist_irq_is_pending(struct kvm_vcpu *vcpu, int irq)
@@ -1338,12 +1344,56 @@ epilog:
}
 }
 
+static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+{
+   int level_pending = 0;
+
+   vlr.state = 0;
+   vlr.hwirq = 0;
+   vgic_set_lr(vcpu, lr, vlr);
+
+   /*
+* If the IRQ was EOIed (called from vgic_process_maintenance) or it
+* went from active to non-active (called from vgic_sync_hwirq) it was
+* also ACKed and we we therefore assume we can clear the soft pending
+* state (should it had been set) for this interrupt.
+*
+* Note: if the IRQ soft pending state was set after the IRQ was
+* acked, it actually shouldn't be cleared, but we have no way of
+* knowing that unless we start trapping ACKs when the soft-pending
+* state is set.
+*/
+   vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
+
+   /*
+* Tell the gic to start sampling the line of this interrupt again.
+*/
+   vgic_irq_clear_queued(vcpu, vlr.irq);
+
+   /* Any additional pending interrupt? */
+   if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+   vgic_cpu_irq_set(vcpu, vlr.irq);
+   level_pending = 1;
+   } else {
+   vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+   vgic_cpu_irq_clear(vcpu, vlr.irq);
+   }
+
+   /*
+* Despite being EOIed, the LR may not have
+* been marked as empty.
+*/
+   vgic_sync_lr_elrsr(vcpu, lr, vlr);
+
+   return level_pending;
+}
+
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 {
u32 status = vgic_get_interrupt_status(vcpu);
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-   bool level_pending = false;
struct kvm *kvm = vcpu->kvm;
+   int level_pending = 0;
 
kvm_debug("STATUS = %08x\n", status);
 
@@ -1358,54 +1408,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
*vcpu)
 
for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
-   WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 
-   spin_lock(&dist->lock);
-   vgic_irq_clear_queued(vcpu, vlr.irq);
+   WARN_ON(vgi

[PULL 17/21] arm64: kvm: restore EL1N SP for panic

2015-11-04 Thread Christoffer Dall
From: Mark Rutland 

If we panic in hyp mode, we inject a call to panic() into the EL1N host
kernel. If a guest context is active, we first attempt to restore the
minimal amount of state necessary to execute the host kernel with
restore_sysregs.

However, the SP is restored as part of restore_common_regs, and so we
may return to the host's panic() function with the SP of the guest. Any
calculations based on the SP will be bogus, and any attempt to access
the stack will result in recursive data aborts.

When running Linux as a guest, the guest's EL1N SP is like to be some
valid kernel address. In this case, the host kernel may use that region
as a stack for panic(), corrupting it in the process.

Avoid the problem by restoring the host SP prior to returning to the
host. To prevent misleading backtraces in the host, the FP is zeroed at
the same time. We don't need any of the other "common" registers in
order to panic successfully.

Signed-off-by: Mark Rutland 
Acked-by: Marc Zyngier 
Cc: Christoffer Dall 
Cc: 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp.S | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index e583613..1599701 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -880,6 +880,14 @@ __kvm_hyp_panic:
 
bl __restore_sysregs
 
+   /*
+* Make sure we have a valid host stack, and don't leave junk in the
+* frame pointer that will give us a misleading host stack unwinding.
+*/
+   ldr x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)]
+   msr sp_el1, x22
+   mov x29, xzr
+
 1: adr x0, __hyp_panic_str
adr x1, 2f
ldp x2, x3, [x1]
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 16/21] arm/arm64: KVM: Add tracepoints for vgic and timer

2015-11-04 Thread Christoffer Dall
The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
or tracepoint infrastructure defined.  Rewriting some of the timer code
handling showed me how much we need this, so let's add these simple
trace points once and for all and we can easily expand with additional
trace points in these files as we go along.

Cc: Wei Huang 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/arch_timer.c |  4 +++
 virt/kvm/arm/trace.h  | 63 +++
 virt/kvm/arm/vgic.c   |  5 
 3 files changed, 72 insertions(+)
 create mode 100644 virt/kvm/arm/trace.h

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 523816d..21a0ab2 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -28,6 +28,8 @@
 #include 
 #include 
 
+#include "trace.h"
+
 static struct timecounter *timecounter;
 static struct workqueue_struct *wqueue;
 static unsigned int host_vtimer_irq;
@@ -129,6 +131,8 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, 
bool new_level)
BUG_ON(!vgic_initialized(vcpu->kvm));
 
timer->irq.level = new_level;
+   trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->map->virt_irq,
+  timer->irq.level);
ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
 timer->map,
 timer->irq.level);
diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h
new file mode 100644
index 000..37d8b98
--- /dev/null
+++ b/virt/kvm/arm/trace.h
@@ -0,0 +1,63 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include 
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for vgic
+ */
+TRACE_EVENT(vgic_update_irq_pending,
+   TP_PROTO(unsigned long vcpu_id, __u32 irq, bool level),
+   TP_ARGS(vcpu_id, irq, level),
+
+   TP_STRUCT__entry(
+   __field(unsigned long,  vcpu_id )
+   __field(__u32,  irq )
+   __field(bool,   level   )
+   ),
+
+   TP_fast_assign(
+   __entry->vcpu_id= vcpu_id;
+   __entry->irq= irq;
+   __entry->level  = level;
+   ),
+
+   TP_printk("VCPU: %ld, IRQ %d, level: %d",
+ __entry->vcpu_id, __entry->irq, __entry->level)
+);
+
+/*
+ * Tracepoints for arch_timer
+ */
+TRACE_EVENT(kvm_timer_update_irq,
+   TP_PROTO(unsigned long vcpu_id, __u32 irq, int level),
+   TP_ARGS(vcpu_id, irq, level),
+
+   TP_STRUCT__entry(
+   __field(unsigned long,  vcpu_id )
+   __field(__u32,  irq )
+   __field(int,level   )
+   ),
+
+   TP_fast_assign(
+   __entry->vcpu_id= vcpu_id;
+   __entry->irq= irq;
+   __entry->level  = level;
+   ),
+
+   TP_printk("VCPU: %ld, IRQ %d, level %d",
+ __entry->vcpu_id, __entry->irq, __entry->level)
+);
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../../virt/kvm/arm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 84abc6f..d4669eb 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -34,6 +34,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 /*
  * How the whole thing works (courtesy of Christoffer Dall):
  *
@@ -1574,6 +1577,8 @@ static int vgic_update_irq_pending(struct kvm *kvm, int 
cpuid,
int enabled;
bool ret = true, can_inject = true;
 
+   trace_vgic_update_irq_pending(cpuid, irq_num, level);
+
if (irq_num >= min(kvm->arch.vgic.nr_irqs, 1020))
return -EINVAL;
 
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 09/21] arm/arm64: KVM : Enable vhost device selection under KVM config menu

2015-11-04 Thread Christoffer Dall
From: Wei Huang 

vhost drivers provide guest VMs with better I/O performance and lower
CPU utilization. This patch allows users to select vhost devices under
KVM configuration menu on ARM. This makes vhost support on arm/arm64
on a par with other architectures (e.g. x86, ppc).

Signed-off-by: Wei Huang 
Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/Kconfig   | 2 ++
 arch/arm64/kvm/Kconfig | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 356970f..95a0005 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -46,4 +46,6 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
+source drivers/vhost/Kconfig
+
 endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 5c7e920..38102f5 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -41,4 +41,6 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
+source drivers/vhost/Kconfig
+
 endif # VIRTUALIZATION
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 15/21] arm/arm64: KVM: Improve kvm_exit tracepoint

2015-11-04 Thread Christoffer Dall
The ARM architecture only saves the exit class to the HSR (ESR_EL2 for
arm64) on synchronous exceptions, not on asynchronous exceptions like an
IRQ.  However, we only report the exception class on kvm_exit, which is
confusing because an IRQ looks like it exited at some PC with the same
reason as the previous exit.  Add a lookup table for the exception index
and prepend the kvm_exit tracepoint text with the exception type to
clarify this situation.

Also resolve the exception class (EC) to a human-friendly text version
so the trace output becomes immediately usable for debugging this code.

Cc: Wei Huang 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_arm.h   | 20 
 arch/arm/kvm/arm.c   |  2 +-
 arch/arm/kvm/trace.h | 10 +++---
 arch/arm64/include/asm/kvm_arm.h | 16 
 4 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index d995821..dc641dd 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -218,4 +218,24 @@
 #define HSR_DABT_CM(1U << 8)
 #define HSR_DABT_EA(1U << 9)
 
+#define kvm_arm_exception_type \
+   {0, "RESET" },  \
+   {1, "UNDEFINED" },  \
+   {2, "SOFTWARE" },   \
+   {3, "PREF_ABORT" }, \
+   {4, "DATA_ABORT" }, \
+   {5, "IRQ" },\
+   {6, "FIQ" },\
+   {7, "HVC" }
+
+#define HSRECN(x) { HSR_EC_##x, #x }
+
+#define kvm_arm_exception_class \
+   HSRECN(UNKNOWN), HSRECN(WFI), HSRECN(CP15_32), HSRECN(CP15_64), \
+   HSRECN(CP14_MR), HSRECN(CP14_LS), HSRECN(CP_0_13), HSRECN(CP10_ID), \
+   HSRECN(JAZELLE), HSRECN(BXJ), HSRECN(CP14_64), HSRECN(SVC_HYP), \
+   HSRECN(HVC), HSRECN(SMC), HSRECN(IABT), HSRECN(IABT_HYP), \
+   HSRECN(DABT), HSRECN(DABT_HYP)
+
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ed15747..eab83b2 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -635,7 +635,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 * guest time.
 */
kvm_guest_exit();
-   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+   trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), 
*vcpu_pc(vcpu));
 
/*
 * We must sync the timer state before the vgic state so that
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 0ec3539..c25a885 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -25,21 +25,25 @@ TRACE_EVENT(kvm_entry,
 );
 
 TRACE_EVENT(kvm_exit,
-   TP_PROTO(unsigned int exit_reason, unsigned long vcpu_pc),
-   TP_ARGS(exit_reason, vcpu_pc),
+   TP_PROTO(int idx, unsigned int exit_reason, unsigned long vcpu_pc),
+   TP_ARGS(idx, exit_reason, vcpu_pc),
 
TP_STRUCT__entry(
+   __field(int,idx )
__field(unsigned int,   exit_reason )
__field(unsigned long,  vcpu_pc )
),
 
TP_fast_assign(
+   __entry->idx= idx;
__entry->exit_reason= exit_reason;
__entry->vcpu_pc= vcpu_pc;
),
 
-   TP_printk("HSR_EC: 0x%04x, PC: 0x%08lx",
+   TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx",
+ __print_symbolic(__entry->idx, kvm_arm_exception_type),
  __entry->exit_reason,
+ __print_symbolic(__entry->exit_reason, 
kvm_arm_exception_class),
  __entry->vcpu_pc)
 );
 
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 9694f26..5e6857b 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -200,4 +200,20 @@
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK (~UL(0xf))
 
+#define kvm_arm_exception_type \
+   {0, "IRQ" },\
+   {1, "TRAP" }
+
+#define ECN(x) { ESR_ELx_EC_##x, #x }
+
+#define kvm_arm_exception_class \
+   ECN(UNKNOWN), ECN(WFx), ECN(CP15_32), ECN(CP15_64), ECN(CP14_MR), \
+   ECN(CP14_LS), ECN(FP_ASIMD), ECN(CP10_ID), ECN(CP14_64), ECN(SVC64), \
+   ECN(HVC64), ECN(SMC64), ECN(SYS64), ECN(IMP_DEF), ECN(IABT_LOW), \
+   ECN(IABT_CUR), ECN(PC_ALIGN), ECN(DABT_LOW), ECN(DABT_CUR), \
+   ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \
+   ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \
+   ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
+   ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
+
 #endif /* __ARM64_KVM_ARM_H__ */
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majo

Re: [PULL 00/21] KVM/ARM Changes for v4.4-rc1

2015-11-04 Thread Paolo Bonzini


On 04/11/2015 15:49, Christoffer Dall wrote:
> Hi Paolo,
> 
> Here is the set of changes for v4.4.  Some of the commits listed here were
> already merged as fixes for v4.3, but since they are not in kvm/next yet, they
> show up here.  Let me know if you want me to handle this differently somehow.
> 
> As usual, there's a lot of churn in the vgic and timer code, and then there 
> are
> number of smaller tweaks and adjustments.  Nothing major this time around.  
> For
> a detailed description, see below.
> 
> The following changes since commit 920552b213e3dc832a874b4e7ba29ecddbab31bc:
> 
>   KVM: disable halt_poll_ns as default for s390x (2015-09-25 10:31:30 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> tags/kvm-arm-for-4.4
> 
> for you to fetch changes up to 26caea7693cb99833fe4ecc544c842289d6b3f69:
> 
>   KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() (2015-11-04 
> 15:29:49 +0100)
> 
> 
> Thanks,
> -Christoffer
> 
> 
> KVM/ARM Changes for v4.4-rc1
> 
> Includes a number of fixes for the arch-timer, introducing proper
> level-triggered semantics for the arch-timers, a series of patches to
> synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
> improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
> getting rid of redundant state, and finally a stylistic change that gets rid 
> of
> some ctags warnings.
> 
> Christoffer Dall (10):
>   KVM: Add kvm_arch_vcpu_{un}blocking callbacks
>   arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
>   arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
>   arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
>   arm/arm64: KVM: Use appropriate define in VGIC reset code
>   arm/arm64: KVM: Add forwarded physical interrupts documentation
>   arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
>   arm/arm64: KVM: Support edge-triggered forwarded interrupts
>   arm/arm64: KVM: Improve kvm_exit tracepoint
>   arm/arm64: KVM: Add tracepoints for vgic and timer
> 
> Eric Auger (4):
>   KVM: arm/arm64: rename pause into power_off
>   KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnable
>   KVM: arm/arm64: check power_off in critical section before VCPU run
>   KVM: arm/arm64: implement kvm_arm_[halt,resume]_guest
> 
> Mark Rutland (1):
>   arm64: kvm: restore EL1N SP for panic
> 
> Michal Marek (1):
>   KVM: arm: Do not indent the arguments of DECLARE_BITMAP
> 
> Pavel Fedin (4):
>   KVM: arm/arm64: Fix vGIC documentation
>   KVM: arm/arm64: Optimize away redundant LR tracking
>   KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
>   KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
> 
> Wei Huang (1):
>   arm/arm64: KVM : Enable vhost device selection under KVM config menu
> 
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 187 +
>  Documentation/virtual/kvm/devices/arm-vgic.txt |  18 +-
>  arch/arm/include/asm/kvm_arm.h |  20 ++
>  arch/arm/include/asm/kvm_host.h|   5 +-
>  arch/arm/kvm/Kconfig   |   2 +
>  arch/arm/kvm/arm.c |  76 +++--
>  arch/arm/kvm/psci.c|  10 +-
>  arch/arm/kvm/trace.h   |  10 +-
>  arch/arm64/include/asm/kvm_arm.h   |  16 ++
>  arch/arm64/include/asm/kvm_host.h  |   5 +-
>  arch/arm64/kvm/Kconfig |   2 +
>  arch/arm64/kvm/hyp.S   |   8 +
>  arch/mips/include/asm/kvm_host.h   |   2 +
>  arch/powerpc/include/asm/kvm_host.h|   2 +
>  arch/s390/include/asm/kvm_host.h   |   2 +
>  arch/x86/include/asm/kvm_host.h|   3 +
>  include/kvm/arm_arch_timer.h   |   4 +-
>  include/kvm/arm_vgic.h |  16 +-
>  include/linux/kvm_host.h   |   2 +
>  virt/kvm/arm/arch_timer.c  | 173 
>  virt/kvm/arm/trace.h   |  63 +
>  virt/kvm/arm/vgic-v2.c |   6 +-
>  virt/kvm/arm/vgic-v3.c |   6 +-
>  virt/kvm/arm/vgic.c| 308 
> -
>  virt/kvm/kvm_main.c|   3 +
>  25 files changed, 646 insertions(+), 303 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>  create mode 100644 virt/kvm/arm/trace.h
> 

Pulled into kvm/next, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio: Fix bug in vfio_device_get_from_name()

2015-11-04 Thread Alex Williamson
On Wed, 2015-11-04 at 13:53 +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> The vfio_device_get_from_name() function might return a
> non-NULL pointer, when called with a device name that is not
> found in the list. This causes undefined behavior, in my
> case calling an invalid function pointer later on:
> 
>  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>  BUG: unable to handle kernel paging request at 8800cb3ddc08
> 
> [...]
> 
>  Call Trace:
>   [] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio]
>   [] do_vfs_ioctl+0x2cd/0x4c0
>   [] ? __fget+0x77/0xb0
>   [] SyS_ioctl+0x79/0x90
>   [] ? syscall_return_slowpath+0x50/0x130
>   [] entry_SYSCALL_64_fastpath+0x16/0x75
> 
> Fix the issue by returning NULL when there is no device with
> the requested name in the list.
> 
> Cc: sta...@vger.kernel.org # v4.2+
> Fixes: 4bc94d5dc95d ("vfio: Fix lockdep issue")
> Signed-off-by: Joerg Roedel 
> ---

Thanks for tracking this down, Joerg!  Looks right, I'll queue it for
next.  Thanks,

Alex

>  drivers/vfio/vfio.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 563c510..8c50ea6 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -692,11 +692,12 @@ EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
>  static struct vfio_device *vfio_device_get_from_name(struct vfio_group 
> *group,
>char *buf)
>  {
> - struct vfio_device *device;
> + struct vfio_device *it, *device = NULL;
>  
>   mutex_lock(&group->device_lock);
> - list_for_each_entry(device, &group->device_list, group_next) {
> - if (!strcmp(dev_name(device->dev), buf)) {
> + list_for_each_entry(it, &group->device_list, group_next) {
> + if (!strcmp(dev_name(it->dev), buf)) {
> + device = it;
>   vfio_device_get(device);
>   break;
>   }



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] vfio: make an array larger

2015-11-04 Thread Joe Perches
On Wed, 2015-11-04 at 16:26 +0300, Dan Carpenter wrote:
> Smatch complains about a possible out of bounds error:
> 
>   drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init()
>   error: buffer overflow 'pci_cap_length' 20 <= 20
> 
> Fix this by making the array larger.
> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> b/drivers/vfio/pci/vfio_pci_config.c
[]
> @@ -46,7 +46,7 @@
>   *   0: Removed from the user visible capability list
>   *   FF: Variable length
>   */
> -static u8 pci_cap_length[] = {
> +static u8 pci_cap_length[PCI_CAP_ID_MAX + 1] = {
>   [PCI_CAP_ID_BASIC]  = PCI_STD_HEADER_SIZEOF, /* pci config header */
>   [PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
>   [PCI_CAP_ID_AGP]= PCI_AGP_SIZEOF,

Doesn't the same thing happen with pci_ext_cap_length?
Both array declarations might be better as const.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] vfio: make an array larger

2015-11-04 Thread Alex Williamson
On Wed, 2015-11-04 at 16:26 +0300, Dan Carpenter wrote:
> Smatch complains about a possible out of bounds error:
> 
>   drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init()
>   error: buffer overflow 'pci_cap_length' 20 <= 20
> 
> Fix this by making the array larger.
> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> b/drivers/vfio/pci/vfio_pci_config.c
> index ff75ca3..001d48a 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -46,7 +46,7 @@
>   *   0: Removed from the user visible capability list
>   *   FF: Variable length
>   */
> -static u8 pci_cap_length[] = {
> +static u8 pci_cap_length[PCI_CAP_ID_MAX + 1] = {
>   [PCI_CAP_ID_BASIC]  = PCI_STD_HEADER_SIZEOF, /* pci config header */
>   [PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
>   [PCI_CAP_ID_AGP]= PCI_AGP_SIZEOF,

This doesn't make a whole lot of sense to me.  The last entry we define
is:

[PCI_CAP_ID_AF] = PCI_CAP_AF_SIZEOF,
};

and PCI_CAP_ID_MAX is defined as:

#define  PCI_CAP_ID_MAX PCI_CAP_ID_AF

So the array is implicitly sized to PCI_CAP_ID_MAX + 1 already, this
doesn't make it any larger.  I imagine this silences smatch because it's
hitting this:

if (cap <= PCI_CAP_ID_MAX) {
len = pci_cap_length[cap];

And it doesn't like that we're indexing an array that has entries up to
PCI_CAP_ID_AF and we're testing against PCI_CAP_ID_MAX.  They happen to
be the same now, but that could change and then we'd index off the end
of the array.  That's unlikely, but valid.  Is that the real
justification for this patch?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/4] dma ops and virtio

2015-11-04 Thread Cornelia Huck
On Wed, 4 Nov 2015 15:29:23 +0100
Cornelia Huck  wrote:

> I'm currently suspecting some endianness issues, probably with the ecw
> accesses, which is why I'd be interested in your trace information (as
> I currently don't have a LE test setup at hand.)

I think I've got it. We have sense_data as a byte array, which
implicitly makes it BE already. When we copy to the ecws while building
the irb, the data ends up in 32 bit values. The conversion from host
endianness to BE now treats them as LE on your system...

Could you please give the following qemu patch a try?

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index c033612..a13494e 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -892,8 +892,16 @@ int css_do_tsch_get_irb(SubchDev *sch, IRB *target_irb, 
int *irb_len)
 /* If a unit check is pending, copy sense data. */
 if ((s->dstat & SCSW_DSTAT_UNIT_CHECK) &&
 (p->chars & PMCW_CHARS_MASK_CSENSE)) {
+uint32_t ecw[8];
+int i;
+
 irb.scsw.flags |= SCSW_FLAGS_MASK_ESWF | SCSW_FLAGS_MASK_ECTL;
-memcpy(irb.ecw, sch->sense_data, sizeof(sch->sense_data));
+/* Attention: sense_data is already BE! */
+memset(ecw, 0, sizeof(ecw));
+memcpy(ecw, sch->sense_data, sizeof(sch->sense_data));
+for (i = 0; i < ARRAY_SIZE(ecw); i++) {
+irb.ecw[i] = be32_to_cpu(ecw[i]);
+}
 irb.esw[1] = 0x0100 | (sizeof(sch->sense_data) << 8);
 }
 }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/4] dma ops and virtio

2015-11-04 Thread Andy Lutomirski
On Wed, Nov 4, 2015 at 9:52 AM, Cornelia Huck  wrote:
> On Wed, 4 Nov 2015 15:29:23 +0100
> Cornelia Huck  wrote:
>
>> I'm currently suspecting some endianness issues, probably with the ecw
>> accesses, which is why I'd be interested in your trace information (as
>> I currently don't have a LE test setup at hand.)
>
> I think I've got it. We have sense_data as a byte array, which
> implicitly makes it BE already. When we copy to the ecws while building
> the irb, the data ends up in 32 bit values. The conversion from host
> endianness to BE now treats them as LE on your system...
>
> Could you please give the following qemu patch a try?

Tested-by: Andy Lutomirski 

Now my test script panics for the right reason (init isn't actually an
s390 binary).  Thanks!

I'll update my branch with the latest s390 DMA stuff, and now I can
give it at least basic regression testing.  Of course, I have to find
a root filesystem, too. :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] vfio: make an array larger

2015-11-04 Thread Dan Carpenter
Sorry, I should have said that I am on linux-next at the start.

> > -static u8 pci_cap_length[] = {
> > +static u8 pci_cap_length[PCI_CAP_ID_MAX + 1] = {
> > [PCI_CAP_ID_BASIC]  = PCI_STD_HEADER_SIZEOF, /* pci config header */
> > [PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
> > [PCI_CAP_ID_AGP]= PCI_AGP_SIZEOF,
> 
> This doesn't make a whole lot of sense to me.  The last entry we define
> is:
> 
> [PCI_CAP_ID_AF] = PCI_CAP_AF_SIZEOF,

Yes.

> };
> 
> and PCI_CAP_ID_MAX is defined as:
> 
> #define  PCI_CAP_ID_MAX PCI_CAP_ID_AF

No.  I am on linux-next and we appear to have added a new element
beyond PCI_CAP_ID_AF.

#define  PCI_CAP_ID_AF  0x13/* PCI Advanced Features */
#define  PCI_CAP_ID_EA  0x14/* PCI Enhanced Allocation */
#define  PCI_CAP_ID_MAX PCI_CAP_ID_EA

> 
> So the array is implicitly sized to PCI_CAP_ID_MAX + 1 already, this
> doesn't make it any larger.

In linux-next it makes it larger.  But also explicitly using
PCI_CAP_ID_MAX + 1 is cleaner as well as fixing the bug in case we add
more elements later again.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] vfio: make an array larger

2015-11-04 Thread Dan Carpenter
On Wed, Nov 04, 2015 at 08:40:19AM -0800, Joe Perches wrote:
> Doesn't the same thing happen with pci_ext_cap_length?

pci_ext_cap_length is fine as-is but you're right that we probably
should make the size explicit as well.  I will fix and resend.

> Both array declarations might be better as const.

Sure.  I will do this as well.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] vfio: make an array larger

2015-11-04 Thread Alex Williamson
On Wed, 2015-11-04 at 21:20 +0300, Dan Carpenter wrote:
> Sorry, I should have said that I am on linux-next at the start.
> 
> > > -static u8 pci_cap_length[] = {
> > > +static u8 pci_cap_length[PCI_CAP_ID_MAX + 1] = {
> > >   [PCI_CAP_ID_BASIC]  = PCI_STD_HEADER_SIZEOF, /* pci config header */
> > >   [PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
> > >   [PCI_CAP_ID_AGP]= PCI_AGP_SIZEOF,
> > 
> > This doesn't make a whole lot of sense to me.  The last entry we define
> > is:
> > 
> > [PCI_CAP_ID_AF] = PCI_CAP_AF_SIZEOF,
> 
> Yes.
> 
> > };
> > 
> > and PCI_CAP_ID_MAX is defined as:
> > 
> > #define  PCI_CAP_ID_MAX PCI_CAP_ID_AF
> 
> No.  I am on linux-next and we appear to have added a new element
> beyond PCI_CAP_ID_AF.
> 
> #define  PCI_CAP_ID_AF  0x13/* PCI Advanced Features */
> #define  PCI_CAP_ID_EA  0x14/* PCI Enhanced Allocation */
> #define  PCI_CAP_ID_MAX PCI_CAP_ID_EA
> 
> > 
> > So the array is implicitly sized to PCI_CAP_ID_MAX + 1 already, this
> > doesn't make it any larger.
> 
> In linux-next it makes it larger.  But also explicitly using
> PCI_CAP_ID_MAX + 1 is cleaner as well as fixing the bug in case we add
> more elements later again.

Ok, all the pieces line up now.  Please add mention of that to the
commit log and I'll look for the respin including the same for
pci_ext_cap_length.  Thanks for spotting this!

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] vfio: Include No-IOMMU mode

2015-11-04 Thread Alex Williamson
There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system.  There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines.  However, there are still those users
that want userspace drivers even under those conditions.  The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has.  In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.

This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
should make it very clear that this mode is not safe.  Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode.  Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported.  This patch includes no-iommu support for the vfio-pci bus
driver only.

Signed-off-by: Alex Williamson 
Acked-by: Michael S. Tsirkin 
---

v2: A minor change to the vfio_for_each_iommu_driver macro:

@@ -229,7 +229,7 @@ static struct vfio_iommu_driver vfio_noiommu_driver = {
for (pos = con->noiommu ? &vfio_noiommu_driver :\
 list_first_entry(&vfio.iommu_drivers_list, \
  struct vfio_iommu_driver, vfio_next); \
-(con->noiommu && pos) || (!con->noiommu && \
+(con->noiommu ? pos != NULL :  \
&pos->vfio_next != &vfio.iommu_drivers_list);   \
  pos = con->noiommu ? NULL : list_next_entry(pos, vfio_next))
 #else

The 0-day coccinelle test seems to think that driver can be null for callers
Using this for-loop, perhaps it's not seeing this as simply a modified
list_for_each_entry macro.  I don't think that's possible, but using a
terinary here makes it more readable and makes all parts of this bi-modal
loop more consistent.

 drivers/vfio/Kconfig|   15 +++
 drivers/vfio/pci/vfio_pci.c |8 +-
 drivers/vfio/vfio.c |  186 ++-
 include/linux/vfio.h|3 +
 include/uapi/linux/vfio.h   |7 ++
 5 files changed, 209 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 4540179..b6d3cdc 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -31,5 +31,20 @@ menuconfig VFIO
 
  If you don't know what to do here, say N.
 
+menuconfig VFIO_NOIOMMU
+   bool "VFIO No-IOMMU support"
+   depends on VFIO
+   help
+ VFIO is built on the ability to isolate devices using the IOMMU.
+ Only with an IOMMU can userspace access to DMA capable devices be
+ considered secure.  VFIO No-IOMMU mode enables IOMMU groups for
+ devices without IOMMU backing for the purpose of re-using the VFIO
+ infrastructure in a non-secure mode.  Use of this mode will result
+ in an unsupportable kernel and will therefore taint the kernel.
+ Device assignment to virtual machines is also not possible with
+ this mode since there is no IOMMU to provide DMA translation.
+
+ If you don't know what to do here, say N.
+
 source "drivers/vfio/pci/Kconfig"
 source "drivers/vfio/platform/Kconfig"
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 964ad57..32b88bd 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -940,13 +940,13 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;
 
-   group = iommu_group_get(&pdev->dev);
+   group = vfio_iommu_group_get(&pdev->dev);
if (!group)
return -EINVAL;
 
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev) {
-   iommu_group_put(group);
+   vfio_iommu_group_put(group, &pdev->dev);
return -ENOMEM;
}
 
@@ -957,7 +957,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
 
ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
if (ret) {
-   iommu_group_put(group);
+   vfio_iommu_group_put(group, &pdev->dev);
kfree(vdev);
return ret;
}
@@ -993,7 +993,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!vdev)
return;
 
-   iommu_group_put(pdev->dev.iommu_group);
+   vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
kfree(vdev);
 
if (vfio_pci_is_vga(pdev)) {
diff --git a/d

Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-04 Thread Christopher Covington
On 11/04/2015 08:31 AM, Christoffer Dall wrote:
> On Tue, Nov 03, 2015 at 01:39:44PM -0600, Rob Herring wrote:
>> On Tue, Nov 3, 2015 at 1:17 PM, Mario Smarduch  
>> wrote:
>>> On 11/3/2015 9:55 AM, Will Deacon wrote:
 On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
> On 11/3/2015 8:33 AM, Christopher Covington wrote:
>> On 11/02/2015 06:51 PM, Mario Smarduch wrote:
>>>this is a re-post from couple weeks ago, please take time to review 
>>> this
>>> simple patch which simplifies DEBUG_LL and prevents kernel crash on 
>>> virtual
>>> platforms.
>>>
>>> Before this patch DEBUG_LL for 'dummy virtual machine':
>>>
>>> ( ) Kernel low-level debugging via EmbeddedICE DCC channel
>>> ( ) Kernel low-level debug output via semihosting I/O
>>> ( ) Kernel low-level debugging via 8250 UART
>>> ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell
>>>
>>> In summary if debug uart is not emulated kernel crashes.
>>> And once you pass that hurdle, uart physical/virtual addresses are 
>>> unknown.
>>> DEBUG_LL comes in handy on many occasions and should be somewhat
>>> intuitive to use like it is for physical platforms. For virtual 
>>> platforms
>>> user may start daubting the host and get into a bigger mess.
>>>
>>> After this patch is applied user gets:
>>>
>>> (X) Kernel low-level debugging on QEMU Virtual Platform
>>> ( ) Kernel low-level debugging on Kvmtool Virtual Platform
>>>. above repeated 
>>>
>>> The virtual addresses selected follow arm reference models, high in 
>>> vmalloc
>>> section with high mem enabled and guest running with >= 1GB of memory. 
>>> The
>>> offset is leftover from arm reference models.
>>
>> Which model? It doesn't appear to match the vexpress 
>> AEM/RTSM/FVP/whatever
>> which used 0x1c09 for UART0.
>
> I recall QEMU virt model had it's own physical address map, for sure I 
> saw the
> virtio-mmio regions assigned in some ARM document. Peter would you know?
>
> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre 
> will that
> stay fixed?

 We make absolutely no guarantees about the memory map provided by kvmtool.
>>>
>>> If that's also the case for qemu, then I guess the best you can do is find 
>>> a way
>>> to dump the device tree. Find the uart, physical address and try figure out 
>>> the
>>> virtual address.
>>>
>>> Pretty involved, hoped for something more automated since that's a handy 
>>> feature.
>>
>> You really only need LL_DEBUG now if you are debugging very early code
>> before memory is setup and/or bad memory. Use earlycon instead which
>> should already be supported both via the pl011 or semihosting. I used
>> it with QEMU semihosting support.
>>
> Then we should really document how to use that with qemu's virt platform
> and kvmtool's platform on both 32-bit and 64-bit so that users can
> easily figure out what they're doing wrong when they get no output.
> 
> In practice, the address for the pl011 is quite unlikely to change, I
> dare speculate, so that documentation shouldn't need frequent updating.

Is it not on by default since the following change?

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f022b8e95379b0433d13509706b66f38fc15dde8

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-04 Thread Ard Biesheuvel
On 4 November 2015 at 19:49, Christopher Covington  wrote:
> On 11/04/2015 08:31 AM, Christoffer Dall wrote:
>> On Tue, Nov 03, 2015 at 01:39:44PM -0600, Rob Herring wrote:
>>> On Tue, Nov 3, 2015 at 1:17 PM, Mario Smarduch  
>>> wrote:
 On 11/3/2015 9:55 AM, Will Deacon wrote:
> On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
>> On 11/3/2015 8:33 AM, Christopher Covington wrote:
>>> On 11/02/2015 06:51 PM, Mario Smarduch wrote:
this is a re-post from couple weeks ago, please take time to review 
 this
 simple patch which simplifies DEBUG_LL and prevents kernel crash on 
 virtual
 platforms.

 Before this patch DEBUG_LL for 'dummy virtual machine':

 ( ) Kernel low-level debugging via EmbeddedICE DCC channel
 ( ) Kernel low-level debug output via semihosting I/O
 ( ) Kernel low-level debugging via 8250 UART
 ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell

 In summary if debug uart is not emulated kernel crashes.
 And once you pass that hurdle, uart physical/virtual addresses are 
 unknown.
 DEBUG_LL comes in handy on many occasions and should be somewhat
 intuitive to use like it is for physical platforms. For virtual 
 platforms
 user may start daubting the host and get into a bigger mess.

 After this patch is applied user gets:

 (X) Kernel low-level debugging on QEMU Virtual Platform
 ( ) Kernel low-level debugging on Kvmtool Virtual Platform
. above repeated 

 The virtual addresses selected follow arm reference models, high in 
 vmalloc
 section with high mem enabled and guest running with >= 1GB of memory. 
 The
 offset is leftover from arm reference models.
>>>
>>> Which model? It doesn't appear to match the vexpress 
>>> AEM/RTSM/FVP/whatever
>>> which used 0x1c09 for UART0.
>>
>> I recall QEMU virt model had it's own physical address map, for sure I 
>> saw the
>> virtio-mmio regions assigned in some ARM document. Peter would you know?
>>
>> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre 
>> will that
>> stay fixed?
>
> We make absolutely no guarantees about the memory map provided by kvmtool.

 If that's also the case for qemu, then I guess the best you can do is find 
 a way
 to dump the device tree. Find the uart, physical address and try figure 
 out the
 virtual address.

 Pretty involved, hoped for something more automated since that's a handy 
 feature.
>>>
>>> You really only need LL_DEBUG now if you are debugging very early code
>>> before memory is setup and/or bad memory. Use earlycon instead which
>>> should already be supported both via the pl011 or semihosting. I used
>>> it with QEMU semihosting support.
>>>
>> Then we should really document how to use that with qemu's virt platform
>> and kvmtool's platform on both 32-bit and 64-bit so that users can
>> easily figure out what they're doing wrong when they get no output.
>>
>> In practice, the address for the pl011 is quite unlikely to change, I
>> dare speculate, so that documentation shouldn't need frequent updating.
>
> Is it not on by default since the following change?
>
> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f022b8e95379b0433d13509706b66f38fc15dde8
>

Yes, but it still requires the plain 'earlycon' argument (i.e, without
'=pl011,...') to be passed on the kernel command line if you want
early output.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] target-i386: enable cflushopt/clwb/pcommit instructions

2015-11-04 Thread Eduardo Habkost
On Fri, Oct 30, 2015 at 01:54:33PM -0700, Richard Henderson wrote:
> On 10/29/2015 12:31 AM, Xiao Guangrong wrote:
> > These instructions are used by NVDIMM drivers and the specification
> > locates at:
> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> > 
> > There instructions are available on Skylake Server
> > 
> > Signed-off-by: Xiao Guangrong 
> > ---
> >  target-i386/cpu.c | 8 +---
> >  target-i386/cpu.h | 3 +++
> >  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> Reviewed-by: Richard Henderson 
> 
> Although it would be nice to update the comments in translate.c to include the
> new insns, since they overlap mfence and sfence.  At present we only check for
> SSE enabled when accepting these; I suppose it's easiest to consider it 
> invalid
> to specify +clwb,-sse?

I assume you want to add the extra SSE requirement to TCG code, not to
generic x86 code, then I have no objections. Your conclusion seems to be
right for pcommit and clflushopt, if I checked the opcodes and encoding
properly. In the case of pcommit (/7, modrm == 0xf8), we check for SSE;
in the case of clflushopt (/7 with a memory operand, modrm != 0xf8), we
check for CLFLUSH.

But in the case of clwb (/6 with a memory operand, modrm != 0xc0), we
are not just requiring SSE2: we are rejecting the instruction unless
modrm == 0xc0. That means TCG is rejecting the clwb instruction, so I
believe we shouldn't add CLWB to TCG_7_0_EBX_FEATURES yet.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fixed KVM problems with old DOS programs. Compatibility can be forced by module parameter.

2015-11-04 Thread Gerhard Wiesinger

Signed-off-by: Gerhard Wiesinger 
---
 arch/x86/kvm/svm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2f9ed1f..e0b00fc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -198,6 +198,10 @@ static bool npt_enabled;
 static int npt = true;
 module_param(npt, int, S_IRUGO);
 +/* allow backward compatibility with e.g. old DOS application */
+static int npt_task_switch_emulation = true;
+module_param(npt_task_switch_emulation, int, S_IRUGO);
+
 /* allow nested virtualization in KVM/SVM */
 static int nested = true;
 module_param(nested, int, S_IRUGO);
@@ -1177,6 +1181,9 @@ static void init_vmcb(struct vcpu_svm *svm, bool 
init_event)

if (npt_enabled) {
/* Setup VMCB for Nested Paging */
control->nested_ctl = 1;
+   if (!npt_task_switch_emulation) {
+   clr_intercept(svm, INTERCEPT_TASK_SWITCH);
+   }
clr_intercept(svm, INTERCEPT_INVLPG);
clr_exception_intercept(svm, PF_VECTOR);
clr_cr_intercept(svm, INTERCEPT_CR3_READ);
--
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-test RFC] x86: Memory instructions test case

2015-11-04 Thread Eduardo Habkost
Quickly hacked test case for memory instructions (clflush, mfence,
sfence, lfence, clflushopt, clwb, pcommit), that simply checks for #UD
exceptions.

This was useful to test TCG handling of those instructions.

The "fake clwb" part will probably break once a new instruction use
those opcodes.

Signed-off-by: Eduardo Habkost 
---
 config/config-x86-common.mak |  2 +
 config/config-x86_64.mak |  2 +-
 x86/memory.c | 88 
 3 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 x86/memory.c

diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
index c2f9908..b89684d 100644
--- a/config/config-x86-common.mak
+++ b/config/config-x86-common.mak
@@ -108,6 +108,8 @@ $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o 
$(TEST_DIR)/vmx_tests.o
 
 $(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o
 
+$(TEST_DIR)/memory.elf: $(cstart.o) $(TEST_DIR)/memory.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d lib/x86/.*.d
diff --git a/config/config-x86_64.mak b/config/config-x86_64.mak
index 7d4eb34..ec4bded 100644
--- a/config/config-x86_64.mak
+++ b/config/config-x86_64.mak
@@ -7,7 +7,7 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
  $(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat \
- $(TEST_DIR)/ioapic.flat
+ $(TEST_DIR)/ioapic.flat $(TEST_DIR)/memory.flat
 tests += $(TEST_DIR)/svm.flat
 tests += $(TEST_DIR)/vmx.flat
 tests += $(TEST_DIR)/tscdeadline_latency.flat
diff --git a/x86/memory.c b/x86/memory.c
new file mode 100644
index 000..cd1eb46
--- /dev/null
+++ b/x86/memory.c
@@ -0,0 +1,88 @@
+/*
+ * Test for x86 cache and memory instructions
+ *
+ * Copyright (c) 2015 Red Hat Inc
+ *
+ * Authors:
+ *  Eduardo Habkost 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include "libcflat.h"
+#include "desc.h"
+#include "processor.h"
+
+static long target;
+static volatile int ud;
+static volatile int isize;
+
+static void handle_ud(struct ex_regs *regs)
+{
+   ud = 1;
+   regs->rip += isize;
+}
+
+int main(int ac, char **av)
+{
+   struct cpuid cpuid7, cpuid1;
+   int xfail;
+
+   setup_idt();
+   handle_exception(UD_VECTOR, handle_ud);
+
+   cpuid1 = cpuid(1);
+   cpuid7 = cpuid_indexed(7, 0);
+
+   /* 3-byte instructions: */
+   isize = 3;
+
+   xfail = !(cpuid1.d & (1U << 19)); /* CLFLUSH */
+   ud = 0;
+   asm volatile("clflush (%0)" : : "b" (&target));
+   report_xfail("clflush", xfail, ud == 0);
+
+   xfail = !(cpuid1.d & (1U << 25)); /* SSE */
+   ud = 0;
+   asm volatile("sfence");
+   report_xfail("sfence", xfail, ud == 0);
+
+   xfail = !(cpuid1.d & (1U << 26)); /* SSE2 */
+   ud = 0;
+   asm volatile("lfence");
+   report_xfail("lfence", xfail, ud == 0);
+
+   ud = 0;
+   asm volatile("mfence");
+   report_xfail("mfence", xfail, ud == 0);
+
+   /* 4-byte instructions: */
+   isize = 4;
+
+   xfail = !(cpuid7.b & (1U << 23)); /* CLFLUSHOPT */
+   ud = 0;
+   /* clflushopt (%rbx): */
+   asm volatile(".byte 0x66, 0x0f, 0xae, 0x3b" : : "b" (&target));
+   report_xfail("clflushopt", xfail, ud == 0);
+
+   xfail = !(cpuid7.b & (1U << 24)); /* CLWB */
+   ud = 0;
+   /* clwb (%rbx): */
+   asm volatile(".byte 0x66, 0x0f, 0xae, 0x33" : : "b" (&target));
+   report_xfail("clwb", xfail, ud == 0);
+
+   ud = 0;
+   /* clwb requires a memory operand, the following is NOT a valid
+* CLWB instruction (modrm == 0xF0).
+*/
+   asm volatile(".byte 0x66, 0x0f, 0xae, 0xf0");
+   report("fake clwb", ud);
+
+   xfail = !(cpuid7.b & (1U << 22)); /* PCOMMIT */
+   ud = 0;
+   /* pcommit: */
+   asm volatile(".byte 0x66, 0x0f, 0xae, 0xf8");
+   report_xfail("pcommit", xfail, ud == 0);
+
+   return report_summary();
+}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] target-i386: tcg: Handle clflushopt/clwb/pcommit instructions

2015-11-04 Thread Eduardo Habkost
This series makes TCG accept the clwb instruction, and make it check the right
CPUID bits when handling clflushopt and pcommit.

Tested using the kvm-unit-test patch I sent earlier today:
  Subject: [kvm-unit-test RFC] x86: Memory instructions test case
  Message-Id: <1446672079-8549-1-git-send-email-ehabk...@redhat.com>

Eduardo Habkost (2):
  target-i386: tcg: Accept clwb instruction
  target-i386: tcg: Check right CPUID bits for clflushopt/pcommit

 target-i386/translate.c | 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] target-i386: tcg: Accept clwb instruction

2015-11-04 Thread Eduardo Habkost
Accept the clwb instruction (66 0F AE /6) if its corresponding feature
flag is enabled on CPUID[7].

Signed-off-by: Eduardo Habkost 
---
 target-i386/translate.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index b400d24..bac1685 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7716,10 +7716,21 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 }
 break;
 case 5: /* lfence */
-case 6: /* mfence */
 if ((modrm & 0xc7) != 0xc0 || !(s->cpuid_features & CPUID_SSE2))
 goto illegal_op;
 break;
+case 6: /* mfence/clwb */
+if (s->prefix & PREFIX_DATA) {
+/* clwb */
+if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_CLWB))
+goto illegal_op;
+gen_lea_modrm(env, s, modrm);
+} else {
+/* mfence */
+if ((modrm & 0xc7) != 0xc0 || !(s->cpuid_features & 
CPUID_SSE2))
+goto illegal_op;
+}
+break;
 case 7: /* sfence / clflush */
 if ((modrm & 0xc7) == 0xc0) {
 /* sfence */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] target-i386: tcg: Check right CPUID bits for clflushopt/pcommit

2015-11-04 Thread Eduardo Habkost
Detect the clflushopt and pcommit instructions and check their
corresponding feature flags, instead of checking CPUID_SSE and
CPUID_CLFLUSH.

Signed-off-by: Eduardo Habkost 
---
 target-i386/translate.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index bac1685..a938967 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7731,16 +7731,28 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 goto illegal_op;
 }
 break;
-case 7: /* sfence / clflush */
+case 7: /* sfence / clflush / clflushopt / pcommit */
 if ((modrm & 0xc7) == 0xc0) {
-/* sfence */
-/* XXX: also check for cpuid_ext2_features & CPUID_EXT2_EMMX */
-if (!(s->cpuid_features & CPUID_SSE))
-goto illegal_op;
+if (s->prefix & PREFIX_DATA) {
+/* pcommit */
+if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_PCOMMIT))
+goto illegal_op;
+} else {
+/* sfence */
+/* XXX: also check for cpuid_ext2_features & 
CPUID_EXT2_EMMX */
+if (!(s->cpuid_features & CPUID_SSE))
+goto illegal_op;
+}
 } else {
-/* clflush */
-if (!(s->cpuid_features & CPUID_CLFLUSH))
-goto illegal_op;
+if (s->prefix & PREFIX_DATA) {
+/* clflushopt */
+if (!(s->cpuid_7_0_ebx_features & 
CPUID_7_0_EBX_CLFLUSHOPT))
+goto illegal_op;
+} else {
+/* clflush */
+if (!(s->cpuid_features & CPUID_CLFLUSH))
+goto illegal_op;
+}
 gen_lea_modrm(env, s, modrm);
 }
 break;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed KVM problems with old DOS programs. Compatibility can be forced by module parameter.

2015-11-04 Thread Paolo Bonzini


On 04/11/2015 20:33, Gerhard Wiesinger wrote:
> Signed-off-by: Gerhard Wiesinger 
> ---
>  arch/x86/kvm/svm.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 2f9ed1f..e0b00fc 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -198,6 +198,10 @@ static bool npt_enabled;
>  static int npt = true;
>  module_param(npt, int, S_IRUGO);
>  +/* allow backward compatibility with e.g. old DOS application */
> +static int npt_task_switch_emulation = true;
> +module_param(npt_task_switch_emulation, int, S_IRUGO);
> +
>  /* allow nested virtualization in KVM/SVM */
>  static int nested = true;
>  module_param(nested, int, S_IRUGO);
> @@ -1177,6 +1181,9 @@ static void init_vmcb(struct vcpu_svm *svm, bool
> init_event)
>  if (npt_enabled) {
>  /* Setup VMCB for Nested Paging */
>  control->nested_ctl = 1;
> +if (!npt_task_switch_emulation) {
> +clr_intercept(svm, INTERCEPT_TASK_SWITCH);
> +}
>  clr_intercept(svm, INTERCEPT_INVLPG);
>  clr_exception_intercept(svm, PF_VECTOR);
>  clr_cr_intercept(svm, INTERCEPT_CR3_READ);

What is the problem you are seeing?  KVM can emulate task switches; the
intercept is set here because of a processor erratum that can mess them
up even though, in theory, AMD supports task switching from guest mode.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed KVM problems with old DOS programs. Compatibility can be forced by module parameter.

2015-11-04 Thread Gerhard Wiesinger

On 04.11.2015 22:27, Paolo Bonzini wrote:


On 04/11/2015 20:33, Gerhard Wiesinger wrote:

Signed-off-by: Gerhard Wiesinger 
---
  arch/x86/kvm/svm.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2f9ed1f..e0b00fc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -198,6 +198,10 @@ static bool npt_enabled;
  static int npt = true;
  module_param(npt, int, S_IRUGO);
  +/* allow backward compatibility with e.g. old DOS application */
+static int npt_task_switch_emulation = true;
+module_param(npt_task_switch_emulation, int, S_IRUGO);
+
  /* allow nested virtualization in KVM/SVM */
  static int nested = true;
  module_param(nested, int, S_IRUGO);
@@ -1177,6 +1181,9 @@ static void init_vmcb(struct vcpu_svm *svm, bool
init_event)
  if (npt_enabled) {
  /* Setup VMCB for Nested Paging */
  control->nested_ctl = 1;
+if (!npt_task_switch_emulation) {
+clr_intercept(svm, INTERCEPT_TASK_SWITCH);
+}
  clr_intercept(svm, INTERCEPT_INVLPG);
  clr_exception_intercept(svm, PF_VECTOR);
  clr_cr_intercept(svm, INTERCEPT_CR3_READ);

What is the problem you are seeing?  KVM can emulate task switches; the
intercept is set here because of a processor erratum that can mess them
up even though, in theory, AMD supports task switching from guest mode.


See old thread:
https://lists.nongnu.org/archive/html/qemu-devel/2012-04/msg01506.html

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-04 Thread Eduardo Habkost
On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
> The value of the migrated vcpu's TSC rate is determined as below.
>  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
> user-specified value will be used.
>  2. If neither a user-specified TSC rate nor a migrated TSC rate is
> present, we will use the TSC rate from KVM (returned by
> KVM_GET_TSC_KHZ).
>  3. Otherwise, we will use the migrated TSC rate.
> 
> Signed-off-by: Haozhong Zhang 
[...]
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 64046cb..aae5e58 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -3034,3 +3034,36 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>  abort();
>  }
> +
> +int kvm_arch_setup_tsc_khz(CPUState *cs)
> +{
> +X86CPU *cpu = X86_CPU(cs);
> +CPUX86State *env = &cpu->env;
> +int r;
> +
> +/*
> + * Prepare vcpu's TSC rate to be migrated.
> + *
> + * - If the user specifies the TSC rate by cpu option 'tsc-freq',
> + *   we will use the user-specified value.
> + *
> + * - If there is neither user-specified TSC rate nor migrated TSC
> + *   rate, we will ask KVM for the TSC rate by calling
> + *   KVM_GET_TSC_KHZ.
> + *
> + * - Otherwise, if there is a migrated TSC rate, we will use the
> + *   migrated value.
> + */
> +if (env->tsc_khz) {
> +env->tsc_khz_saved = env->tsc_khz;
> +} else if (!env->tsc_khz_saved) {
> +r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> +if (r < 0) {
> +fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
> +return r;
> +}

The lack of KVM_CAP_GET_TSC_KHZ should make QEMU abort, unless the user
is explicitly requesting a more strict mode where the TSC frequency will
be guaranteed to never change.

> +env->tsc_khz_saved = r;
> +}

Why do you need a separate tsc_khz_saved field, and don't simply use
tsc_khz? It would have the additional feature of letting QMP clients
query the current TSC rate by asking for the tsc-freq property on CPU
objects.


> +
> +return 0;
> +}

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] vfio: make an array larger

2015-11-04 Thread walter harms


Am 04.11.2015 14:26, schrieb Dan Carpenter:
> Smatch complains about a possible out of bounds error:
> 
>   drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init()
>   error: buffer overflow 'pci_cap_length' 20 <= 20
> 
> Fix this by making the array larger.
> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> b/drivers/vfio/pci/vfio_pci_config.c
> index ff75ca3..001d48a 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -46,7 +46,7 @@
>   *   0: Removed from the user visible capability list
>   *   FF: Variable length
>   */
> -static u8 pci_cap_length[] = {
> +static u8 pci_cap_length[PCI_CAP_ID_MAX + 1] = {
>   [PCI_CAP_ID_BASIC]  = PCI_STD_HEADER_SIZEOF, /* pci config header */
>   [PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
>   [PCI_CAP_ID_AGP]= PCI_AGP_SIZEOF,


(i am sorry Dave)

I am not sure if that is the way to go.
this define make me feel uneasy,
#define   PCI_CAP_ID_MAX PCI_CAP_ID_AF

Would it be possible to ARRAY_SIZE(pci_cap_length) instead of PCI_CAP_ID_MAX ?
Then that would grow automatically with the array. And its more clear what
is actually happening.

re,
 wh



> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed KVM problems with old DOS programs. Compatibility can be forced by module parameter.

2015-11-04 Thread Paolo Bonzini


On 04/11/2015 22:33, Gerhard Wiesinger wrote:
>>>
>> What is the problem you are seeing?  KVM can emulate task switches; the
>> intercept is set here because of a processor erratum that can mess them
>> up even though, in theory, AMD supports task switching from guest mode.
> 
> See old thread:
> https://lists.nongnu.org/archive/html/qemu-devel/2012-04/msg01506.html

Can you obtain the traces you were asked for at the time?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-04 Thread Mario Smarduch


On 11/4/2015 10:51 AM, Ard Biesheuvel wrote:
> On 4 November 2015 at 19:49, Christopher Covington  
> wrote:
>> On 11/04/2015 08:31 AM, Christoffer Dall wrote:
>>> On Tue, Nov 03, 2015 at 01:39:44PM -0600, Rob Herring wrote:
 On Tue, Nov 3, 2015 at 1:17 PM, Mario Smarduch  
 wrote:
> On 11/3/2015 9:55 AM, Will Deacon wrote:
>> On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
>>> On 11/3/2015 8:33 AM, Christopher Covington wrote:
 On 11/02/2015 06:51 PM, Mario Smarduch wrote:
>this is a re-post from couple weeks ago, please take time to 
> review this
> simple patch which simplifies DEBUG_LL and prevents kernel crash on 
> virtual
> platforms.
>
> Before this patch DEBUG_LL for 'dummy virtual machine':
>
> ( ) Kernel low-level debugging via EmbeddedICE DCC channel
> ( ) Kernel low-level debug output via semihosting I/O
> ( ) Kernel low-level debugging via 8250 UART
> ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell
>
> In summary if debug uart is not emulated kernel crashes.
> And once you pass that hurdle, uart physical/virtual addresses are 
> unknown.
> DEBUG_LL comes in handy on many occasions and should be somewhat
> intuitive to use like it is for physical platforms. For virtual 
> platforms
> user may start daubting the host and get into a bigger mess.
>
> After this patch is applied user gets:
>
> (X) Kernel low-level debugging on QEMU Virtual Platform
> ( ) Kernel low-level debugging on Kvmtool Virtual Platform
>. above repeated 
>
> The virtual addresses selected follow arm reference models, high in 
> vmalloc
> section with high mem enabled and guest running with >= 1GB of 
> memory. The
> offset is leftover from arm reference models.

 Which model? It doesn't appear to match the vexpress 
 AEM/RTSM/FVP/whatever
 which used 0x1c09 for UART0.
>>>
>>> I recall QEMU virt model had it's own physical address map, for sure I 
>>> saw the
>>> virtio-mmio regions assigned in some ARM document. Peter would you know?
>>>
>>> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre 
>>> will that
>>> stay fixed?
>>
>> We make absolutely no guarantees about the memory map provided by 
>> kvmtool.
>
> If that's also the case for qemu, then I guess the best you can do is 
> find a way
> to dump the device tree. Find the uart, physical address and try figure 
> out the
> virtual address.
>
> Pretty involved, hoped for something more automated since that's a handy 
> feature.

 You really only need LL_DEBUG now if you are debugging very early code
 before memory is setup and/or bad memory. Use earlycon instead which
 should already be supported both via the pl011 or semihosting. I used
 it with QEMU semihosting support.

>>> Then we should really document how to use that with qemu's virt platform
>>> and kvmtool's platform on both 32-bit and 64-bit so that users can
>>> easily figure out what they're doing wrong when they get no output.
>>>
>>> In practice, the address for the pl011 is quite unlikely to change, I
>>> dare speculate, so that documentation shouldn't need frequent updating.
>>
>> Is it not on by default since the following change?
>>
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f022b8e95379b0433d13509706b66f38fc15dde8
>>
> 
> Yes, but it still requires the plain 'earlycon' argument (i.e, without
> '=pl011,...') to be passed on the kernel command line if you want
> early output.
> 

I spent time debugging 'earlycon' for pl011, ironically using DEBUG_LL, from the
looks of it no mmio uart should work for armv7. It appears earlycon_map()
requires a fixed mapping similar to arm64.

Comparing both options, DEBUG_LL takes you from kernel decompress code to early
FDT parsing. A lot of early_print() calls won't work if DEBUG_LL is not enabled
including dump_machine_table which ends in a endless loop. IMO it's worth
turning this option on for that and other reasons.

'earlycon' is enabled some ways up in setup_arch().

As far as the patch, providing a hint to the user with probable uart addresses
would help and in the worst case "see the latest device tree for the virtual
platform".

- Mario




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too

2015-11-04 Thread Sasha Levin
On 11/04/2015 06:51 AM, Will Deacon wrote:
> + mutex_lock(&pause_lock);
> +
> + /* The kvm->cpus array contains a null pointer in the last location */
> + for (i = 0; ; i++) {
> + if (kvm->cpus[i])
> + pthread_kill(kvm->cpus[i]->thread, SIGKVMEXIT);
> + else
> + break;
> + }
> +
> + kvm__continue(kvm);

In this scenario: if we grabbed pause_lock, signaled vcpu0 to exit, and it did
before we called kvm__continue(), we won't end up releasing pause_lock, which
might cause a lockup later, no?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-04 Thread Peter Hurley
On 11/04/2015 06:28 PM, Mario Smarduch wrote:
> I spent time debugging 'earlycon' for pl011, ironically using DEBUG_LL, from 
> the
> looks of it no mmio uart should work for armv7. It appears earlycon_map()
> requires a fixed mapping similar to arm64.

commit a5f4c561b3b1 ("ARM: 8415/1: early fixmap support for earlycon")
added ARM fixmap support for earlycon since 4.3-rc1. That was in it's 5th
revision; various versions have been on lkml since Dec 2014 or so.

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-04 Thread Haozhong Zhang
On 11/04/15 19:42, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
> > The value of the migrated vcpu's TSC rate is determined as below.
> >  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
> > user-specified value will be used.
> >  2. If neither a user-specified TSC rate nor a migrated TSC rate is
> > present, we will use the TSC rate from KVM (returned by
> > KVM_GET_TSC_KHZ).
> >  3. Otherwise, we will use the migrated TSC rate.
> > 
> > Signed-off-by: Haozhong Zhang 
> [...]
> > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > index 64046cb..aae5e58 100644
> > --- a/target-i386/kvm.c
> > +++ b/target-i386/kvm.c
> > @@ -3034,3 +3034,36 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >  {
> >  abort();
> >  }
> > +
> > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > +{
> > +X86CPU *cpu = X86_CPU(cs);
> > +CPUX86State *env = &cpu->env;
> > +int r;
> > +
> > +/*
> > + * Prepare vcpu's TSC rate to be migrated.
> > + *
> > + * - If the user specifies the TSC rate by cpu option 'tsc-freq',
> > + *   we will use the user-specified value.
> > + *
> > + * - If there is neither user-specified TSC rate nor migrated TSC
> > + *   rate, we will ask KVM for the TSC rate by calling
> > + *   KVM_GET_TSC_KHZ.
> > + *
> > + * - Otherwise, if there is a migrated TSC rate, we will use the
> > + *   migrated value.
> > + */
> > +if (env->tsc_khz) {
> > +env->tsc_khz_saved = env->tsc_khz;
> > +} else if (!env->tsc_khz_saved) {
> > +r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> > +if (r < 0) {
> > +fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
> > +return r;
> > +}
> 
> The lack of KVM_CAP_GET_TSC_KHZ should make QEMU abort, unless the user
> is explicitly requesting a more strict mode where the TSC frequency will
> be guaranteed to never change.
>

I agree KVM_CAP_GET_TSC_KHZ should be checked before KVM_GET_TSC_KHZ,
but I don't think the lack of it should abort QEMU. This piece of code
on the source machine is just to get the TSC frequency to be
migrated. If it fails, it will leave env->tsc_khz_saved be 0. And
according to tsc_khz_needed() in patch 1, if env->tsc_khz_saved == 0,
no TSC frequency will be migrated. So the lack of KVM_CAP_GET_TSC_KHZ
only hurts the migration and does not need to abort QEMU on the source
machine.

> > +env->tsc_khz_saved = r;
> > +}
> 
> Why do you need a separate tsc_khz_saved field, and don't simply use
> tsc_khz? It would have the additional feature of letting QMP clients
> query the current TSC rate by asking for the tsc-freq property on CPU
> objects.
>

It's to avoid overriding env->tsc_khz on the destination in the
migration. I can change this line to
 env->tsc_khz = env->tsc_khz_saved = r;

For the additional QMP feature, will the value of tsc-freq property be
env->tsc_khz? If yes, I guess the above change would be fine?

Haozhong

> 
> > +
> > +return 0;
> > +}
> 
> -- 
> Eduardo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2] KVM: VMX: Fix commit which broke PML

2015-11-04 Thread Kai Huang

Hi Paolo,

Thanks for applying! I am really sorry that I forgot to delete the line 
that clears SECONDARY_EXEC_ENABLE_PML bit in vmx_disable_pml, which is 
renamed to vmx_destroy_pml_buffer now.
It won't impact functionality but to make the function consistent, would 
you also do below? Sorry for such negligence!


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 89f4fa2..ef4ca76 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7826,8 +7826,6 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx 
*vmx)

ASSERT(vmx->pml_pg);
__free_page(vmx->pml_pg);
vmx->pml_pg = NULL;
-
-   vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL, 
SECONDARY_EXEC_ENABLE_PML);

 }

Thanks,
-Kai

On 11/04/2015 08:00 PM, Paolo Bonzini wrote:


On 04/11/2015 06:46, Kai Huang wrote:

I found PML was broken since below commit:

commit feda805fe7c4ed9cf78158e73b1218752e3b4314
Author: Xiao Guangrong 
Date:   Wed Sep 9 14:05:55 2015 +0800

KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update

Unify the update in vmx_cpuid_update()

Signed-off-by: Xiao Guangrong 
[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: Paolo Bonzini 

The reason is in above commit vmx_cpuid_update calls vmx_secondary_exec_control,
in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally (as
PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called after
vcpu is created, PML will be disabled unexpectedly while log-dirty code still
thinks PML is used.

Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control
only when PML is not supported or not enabled (!enable_pml). This is more
reasonable as PML is currently either always enabled or disabled. With this
explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not
needed so also rename vmx_enable{disable}_pml to vmx_create{destroy}_pml_buffer.

Signed-off-by: Kai Huang 

---

v1->v2: Fix this by following Paolo's suggestion. It's better to not to clear
SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control unconditionally but only
clear it when PML is not supported or enabled.

---
  arch/x86/kvm/vmx.c | 15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2ac11641..89f4fa2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4718,8 +4718,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx 
*vmx)
   a current VMCS12
*/
exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
-   /* PML is enabled/disabled in creating/destorying vcpu */
-   exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
+
+   if (!enable_pml)
+   exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
  
  	/* Currently, we allow L1 guest to directly run pcommit instruction. */

exec_control &= ~SECONDARY_EXEC_PCOMMIT;
@@ -7804,7 +7805,7 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
  }
  
-static int vmx_enable_pml(struct vcpu_vmx *vmx)

+static int vmx_create_pml_buffer(struct vcpu_vmx *vmx)
  {
struct page *pml_pg;
  
@@ -7817,12 +7818,10 @@ static int vmx_enable_pml(struct vcpu_vmx *vmx)

vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
  
-	vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL, SECONDARY_EXEC_ENABLE_PML);

-
return 0;
  }
  
-static void vmx_disable_pml(struct vcpu_vmx *vmx)

+static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
  {
ASSERT(vmx->pml_pg);
__free_page(vmx->pml_pg);
@@ -8706,7 +8705,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
  
  	if (enable_pml)

-   vmx_disable_pml(vmx);
+   vmx_destroy_pml_buffer(vmx);
free_vpid(vmx->vpid);
leave_guest_mode(vcpu);
vmx_load_vmcs01(vcpu);
@@ -8790,7 +8789,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
 * for the guest, etc.
 */
if (enable_pml) {
-   err = vmx_enable_pml(vmx);
+   err = vmx_create_pml_buffer(vmx);
if (err)
goto free_vmcs;
}



Applied, thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [kvm-unit-test RFC] x86: Memory instructions test case

2015-11-04 Thread Xiao Guangrong



On 11/05/2015 05:21 AM, Eduardo Habkost wrote:

Quickly hacked test case for memory instructions (clflush, mfence,
sfence, lfence, clflushopt, clwb, pcommit), that simply checks for #UD
exceptions.

This was useful to test TCG handling of those instructions.

The "fake clwb" part will probably break once a new instruction use
those opcodes.


Tested it on the box on that these instruction are available, it worked.

Tested-by: Xiao Guangrong 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 0/3] KVM: arm/arm64: Clean up some obsolete code

2015-11-04 Thread Pavel Fedin
 Hello!

> Actually, I seem to have been just incredibly unlucky with my test
> cycles, because I eventually reproduced the bug without your patches.

 Or lucky, without "un" :)

> I'm going to take this version of the series because that's what I
> reviewed and tested.

 It's OK, as i wrote, v5 is no different from v4 actually, just 0001 bisected. 
And making it was useful because it helped me to make
sure once again that i haven't messed anything up.

> Sorry for the noise.

 It's OK, thank you very much for putting efforts into testing and cooperation.
 You know, since we are talking about this...  This definitely has something to 
do with the reset, and... Looks like nobody resets
vGIC/vTimer, unless the userland does it explicitly by resetting every register 
by hand.
 I know, there is no global "reset" function for the whole VM. But, at least we 
have reset ioctl for vCPU. What if we hook up
vGIC/vTimer there, and reset at least per-CPU objects (CPU interface + redist + 
timer) at this point?

 P.S. I've seen your PULL, and it is missing a little thing that could be good 
for 4.4 too. I've fixed one more bug recently, it
reproduces on CP15-timer-less boards like Exynos: 
http://www.spinics.net/lists/kvm/msg122746.html. Just to make sure that you 
don't
miss it.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] target-i386: enable cflushopt/clwb/pcommit instructions

2015-11-04 Thread Richard Henderson

On 11/04/2015 08:35 PM, Eduardo Habkost wrote:

On Fri, Oct 30, 2015 at 01:54:33PM -0700, Richard Henderson wrote:

On 10/29/2015 12:31 AM, Xiao Guangrong wrote:

These instructions are used by NVDIMM drivers and the specification
locates at:
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

There instructions are available on Skylake Server

Signed-off-by: Xiao Guangrong 
---
  target-i386/cpu.c | 8 +---
  target-i386/cpu.h | 3 +++
  2 files changed, 8 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

Although it would be nice to update the comments in translate.c to include the
new insns, since they overlap mfence and sfence.  At present we only check for
SSE enabled when accepting these; I suppose it's easiest to consider it invalid
to specify +clwb,-sse?


I assume you want to add the extra SSE requirement to TCG code, not to
generic x86 code, then I have no objections.


I don't really want to add any requirement, just point and laugh at anyone who 
reports an bug for the above condition.



But in the case of clwb (/6 with a memory operand, modrm != 0xc0), we
are not just requiring SSE2: we are rejecting the instruction unless
modrm == 0xc0. That means TCG is rejecting the clwb instruction, so I
believe we shouldn't add CLWB to TCG_7_0_EBX_FEATURES yet.


Hmm, yes.

I've cleaned up some of this code on a branch, but it didn't get enough testing 
or review this cycle, so it's going to wait for the next.  I see you've posted 
a patch for this, which should be good enough until then.



r~

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html