[Bug 107561] 4.2 breaks PCI passthrough in QEMU/KVM

2015-12-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=107561

--- Comment #14 from Jasen Borisov  ---
Can confirm. I am also hit by this bug, and my virtual machine doesn't start
either.

I tried applying the patch to both 4.3.0 and linux-next, and neither of those
worked.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Patch V2 2/2] x86, mce: Need to translate GPA to HPA to inject error in guest.

2015-12-11 Thread Chen, Gong
Hi, Ashok

Please add " original author by Huang Ying " at some 
place.
Thanks.

> -Original Message-
> From: Raj, Ashok
> Sent: Friday, December 11, 2015 3:41 AM
> To: kvm@vger.kernel.org
> Cc: Chen, Gong; Gleb Natapov; Paolo Bonzini; qemu-de...@nongnu.org;
> linux-ker...@vger.kernel.org; Boris Petkov; Luck, Tony; Raj, Ashok; Kleen,
> Andi
> Subject: [Patch V2 2/2] x86, mce: Need to translate GPA to HPA to inject
> error in guest.
> 
> From: Gong Chen 
> 
> When we need to test error injection to a specific address using EINJ,
> there needs to be a way to translate GPA to HPA. This will allow host EINJ
> to inject error to test how guest behavior is when a bad address is consumed.
> This permits guest OS to perform its own recovery.
> 
> Signed-off-by: Gong Chen 
> ---
> Sorry about the spam :-(.
> Resending with proper Commit Message. Previous had a bogus From. Fixed
> that.
> before sending.
> 
>  hmp-commands.hx   | 14 ++
>  include/exec/memory.h |  2 ++
>  kvm-all.c | 24 
>  memory.c  | 13 +
>  monitor.c | 16 
>  5 files changed, 69 insertions(+)
>  mode change 100644 => 100755 include/exec/memory.h
>  mode change 100644 => 100755 kvm-all.c
>  mode change 100644 => 100755 memory.c
>  mode change 100644 => 100755 monitor.c
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index bb52e4d..673c00e 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -444,6 +444,20 @@ Start gdbserver session (default @var{port}=1234)
>  ETEXI
> 
>  {
> +  .name = "x-gpa2hva",
> +  .args_type= "fmt:/,addr:l",
> +  .params   = "/fmt addr",
> +  .help = "translate guest physical 'addr' to host virtual 
> address,
> only for debugging",
> +  .mhandler.cmd = do_gpa2hva,
> +},
> +
> +STEXI
> +@item x-gpa2hva @var{addr}
> +@findex x-gpa2hva
> +Translate guest physical @var{addr} to host virtual address, only for
> debugging.
> +ETEXI
> +
> +{
>  .name   = "x",
>  .args_type  = "fmt:/,addr:l",
>  .params = "/fmt addr",
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> old mode 100644
> new mode 100755
> index 0f07159..57d7bf8
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -222,6 +222,7 @@ struct MemoryListener {
> hwaddr addr, hwaddr len);
>  void (*coalesced_mmio_del)(MemoryListener *listener,
> MemoryRegionSection *section,
> hwaddr addr, hwaddr len);
> +int  (*translate_gpa2hva)(MemoryListener *listener, uint64_t paddr,
> uint64_t *vaddr);
>  /* Lower = earlier (during add), later (during del) */
>  unsigned priority;
>  AddressSpace *address_space_filter;
> @@ -1123,6 +1124,7 @@ void memory_global_dirty_log_start(void);
>  void memory_global_dirty_log_stop(void);
> 
>  void mtree_info(fprintf_function mon_printf, void *f);
> +int  memory_translate_gpa2hva(hwaddr paddr, uint64_t *vaddr);
> 
>  /**
>   * memory_region_dispatch_read: perform a read directly to the specified
> diff --git a/kvm-all.c b/kvm-all.c
> old mode 100644
> new mode 100755
> index c648b81..cb029be
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -197,6 +197,29 @@ static KVMSlot
> *kvm_lookup_overlapping_slot(KVMMemoryListener *kml,
>  return found;
>  }
> 
> +
> +static int kvm_translate_gpa2hva(MemoryListener *listener, uint64_t paddr,
> uint64_t *vaddr)
> +{
> +KVMState *s = kvm_state;
> +KVMMemoryListener *kml = container_of(listener, KVMMemoryListener,
> listener);
> +KVMSlot *mem = NULL;
> +int i;
> +
> +for (i = 0; i < s->nr_slots; i++) {
> +mem = &kml->slots[i];
> +if (paddr >= mem->start_addr && paddr < mem->start_addr + mem-
> >memory_size) {
> +*vaddr = (uint64_t)mem->ram + paddr - mem->start_addr;
> +break;
> + }
> +}
> +
> +if (i == s->nr_slots) {
> +fprintf(stderr, "fail to find target physical addr(%ld) in KVM memory
> range\n", paddr);
> + return 1;
> +}
> +return 0;
> +}
> +
>  int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
> hwaddr *phys_addr)
>  {
> @@ -902,6 +925,7 @@ void kvm_memory_listener_register(KVMState *s,
> KVMMemoryListener *kml,
>  kml->listener.log_start = kvm_log_start;
>  kml->listener.log_stop = kvm_log_stop;
>  kml->listener.log_sync = kvm_log_sync;
> +kml->listener.translate_gpa2hva = kvm_translate_gpa2hva;
>  kml->listener.priority = 10;
> 
>  memory_listener_register(&kml->listener, as);
> diff --git a/memory.c b/memory.c
> old mode 100644
> new mode 100755
> index e193658..979dcf8
> --- a/memory.c
> +++ b/memory.c
> @@ -2294,6 +2294,19 @@ static const TypeInfo memory_region_info = {
>  .instance_finalize  = memory_region_finalize,
>  };
> 
> +int memory_translate_gpa2hva(hwaddr paddr, uint64_t *vaddr){
> +MemoryListener *ml = 

[Bug 107561] 4.2 breaks PCI passthrough in QEMU/KVM

2015-12-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=107561

--- Comment #13 from schefis...@gmail.com ---
Sorry for the delay. I couldn't check sooner.
Unfortunately the guest doesn't start with the patch applied. Nothing displayed
(no sync signal from passthrough card). Host dmesg doesn't show any errors.
Usually, even with slow guest I get the following on host dmesg when starting
VM:

...
VFIO - User Level meta-driver version: 0.3
kernel: vgaarb: device changed decodes:
PCI::01:00.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
kernel: vfio_ecap_init: :01:00.0 hiding ecap 0x1e@0x258
kernel: vfio_ecap_init: :01:00.0 hiding ecap 0x19@0x900
kernel: pmd_set_huge: Cannot satisfy [mem 0xe000-0xe020] with a
huge-page mapping due to MTRR override.
- than some USB resets for keyboard passthrough -
kernel: kvm: zapping shadow pages for mmio generation wraparound
...and so on, the guest screen displays UEFI startup by now

With the patch applied it simly stops at the pmd_set_huge line, and nothing
further, no usb, no guest start, no error messages.

I also did follow up on the other tests.
Disabling host mtrrs one-by-one did not yield any results. Guest is still slow
even if disabling all the hosts mtrrs.
Tried nukeing kvm_mtrr_get_guest_memory_type as requested and started a live
distro. Guest did boot, but did not have a /proc/mtrr file. (Graphics scrolling
was slow, I guess on account of lack of cacheing, but otherwise guest was
speedy as normal) I saved some info from the guest, like dmesg, lspci -vvv, and
some sysfs files. You can find them in the same share as before.
https://drive.google.com/folderview?id=0B8ebX_WjVHnGNlN4eTEzU2xtMEk&usp=sharing

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Andy Lutomirski
On Thu, Dec 10, 2015 at 1:32 PM, Marcelo Tosatti  wrote:
> On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
>> I'm trying to clean up kvmclock and I can't get it to work at all.  My
>> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
>>
>> If I boot an SMP (2 vcpus) guest, tracing says:
>>
>>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 0
>>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
>> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
>> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
>> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
>> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
>> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
>> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>
>>
>> If I boot a UP guest, tracing says:
>>
>>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>
>> I suspect, but I haven't verified, that this is fallout from:
>>
>> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
>> Author: Marcelo Tosatti 
>> Date:   Wed May 14 12:43:24 2014 -0300
>>
>> KVM: x86: disable master clock if TSC is reset during suspend
>>
>> Updating system_time from the kernel clock once master clock
>> has been enabled can result in time backwards event, in case
>> kernel clock frequency is lower than TSC frequency.
>>
>> Disable master clock in case it is necessary to update it
>> from the resume path.
>>
>> Signed-off-by: Marcelo Tosatti 
>> Signed-off-by: Paolo Bonzini 
>>
>>
>> Can we please stop making kvmclock more complex?  It's a beast right
>> now, and not in a good way.  It's far too tangled with the vclock
>> machinery on both the host and guest sides, the pvclock stuff is not
>> well thought out (even in principle in an ABI sense), and it's never
>> been clear to my what problem exactly the kvmclock stuff is supposed
>> to solve.
>>
>> I'm somewhat tempted to suggest that we delete kvmclock entirely and
>> start over.  A correctly functioning KVM guest using TSC (i.e.
>> ignoring kvmclock entirely)
>> seems to work rather more reliably and
>> considerably faster than a kvmclock guest.
>>
>> --Andy
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC
>
> Andy,
>
> I am all for solving practical problems rather than pleasing aesthetic
> pleasure.
>
>> Updating system_time from the kernel clock once master clock
>> has been enabled can result in time backwards event, in case
>> kernel clock frequency is lower than TSC frequency.
>>
>> Disable master clock in case it is necessary to update it
>> from the resume path.
>
>> once master clock
>> has been enabled can result in time backwards event, in case
>> kernel clock frequency is lower than TSC frequency.
>
> guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads.
>
> If the effective frequency of the kernel clock is lower (for example
> due to NTP correcting the TSC frequency of the system), and you resume
> and update the system, the following happens:
>
> guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
> reads=LARGE VALUE.
> suspend/resume event.
> guest visible clock = tsc_timestamp (updated at time N) + scaled tsc reads=0.
>

I'm still not seeing the issue.

The formula is:

(((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
pvti->tsc_shift) + pvti->system_time

Obviously, if you reset pvti->tsc_timestamp to the current tsc value
after suspend/resume, you would also need to update system_time.

I don't see what this has to do with suspend/resume or with whether
the effective scale factor is greater than or less than one.  The only
suspend/resume interaction I can see is that, if the host allows the
guest-observed TSC value to jump (which is arguably a bug, what that's
not important here), it needs to update pvti before resuming the
guest.

Can you clarify concretely what goes wrong here?

(I'm also at a bit of a loss as to why this needs both system_time and
tsc_ti

Re: [PATCH v3 11/22] arm64: KVM: Add patchable function selector

2015-12-11 Thread Christoffer Dall
On Mon, Dec 07, 2015 at 10:53:27AM +, Marc Zyngier wrote:
> KVM so far relies on code patching, and is likely to use it more
> in the future. The main issue is that our alternative system works
> at the instruction level, while we'd like to have alternatives at
> the function level.
> 
> In order to cope with this, add the "hyp_alternate_select" macro that
> outputs a brief sequence of code that in turn can be patched, allowing
> an alternative function to be selected.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Marcelo Tosatti
On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
> I'm trying to clean up kvmclock and I can't get it to work at all.  My
> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
> 
> If I boot an SMP (2 vcpus) guest, tracing says:
> 
>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> 
> If I boot a UP guest, tracing says:
> 
>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> I suspect, but I haven't verified, that this is fallout from:
> 
> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
> Author: Marcelo Tosatti 
> Date:   Wed May 14 12:43:24 2014 -0300
> 
> KVM: x86: disable master clock if TSC is reset during suspend
> 
> Updating system_time from the kernel clock once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.
> 
> Disable master clock in case it is necessary to update it
> from the resume path.
> 
> Signed-off-by: Marcelo Tosatti 
> Signed-off-by: Paolo Bonzini 
> 
> 
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.
> 
> 
> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> start over.  A correctly functioning KVM guest using TSC (i.e.
> ignoring kvmclock entirely) seems to work rather more reliably and
> considerably faster than a kvmclock guest.
> 
> --Andy

Users can do that, if they want. "clocksource=tsc" kernel option.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Marcelo Tosatti
On Wed, Dec 09, 2015 at 02:27:36PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini  wrote:
> >
> >
> > On 09/12/2015 22:49, Andy Lutomirski wrote:
> >> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini  wrote:
> >>>
> >>>
> >>> On 09/12/2015 22:10, Andy Lutomirski wrote:
>  Can we please stop making kvmclock more complex?  It's a beast right
>  now, and not in a good way.  It's far too tangled with the vclock
>  machinery on both the host and guest sides, the pvclock stuff is not
>  well thought out (even in principle in an ABI sense), and it's never
>  been clear to my what problem exactly the kvmclock stuff is supposed
>  to solve.
> >>>
> >>> It's supposed to solve the problem that:
> >>>
> >>> - not all hosts have a working TSC
> >>
> >> Fine, but we don't need any vdso integration for that.
> >
> > Well, you still want a fast time source.  That was a given. :)
> 
> If the host can't figure out how to give *itself* a fast time source,
> I'd be surprised if KVM can manage to give the guest a fast, reliable
> time source.
> 
> >
> >>> - even if they all do, virtual machines can be migrated (or
> >>> saved/restored) to a host with a different TSC frequency
> >>>
> >>> - any MMIO- or PIO-based mechanism to access the current time is orders
> >>> of magnitude slower than the TSC and less precise too.
> >>
> >> Yup.  But TSC by itself gets that benefit, too.
> >
> > Yes, the problem is if you want to solve all three of them.  The first
> > two are solved by the ACPI PM timer with a decent resolution (70
> > ns---much faster anyway than an I/O port access).  The third is solved
> > by TSC.  To solve all three, you need kvmclock.
> 
> Still confused.  Is kvmclock really used in cases where even the host
> can't pull of working TSC?
> 
> >
>  I'm somewhat tempted to suggest that we delete kvmclock entirely and
>  start over.  A correctly functioning KVM guest using TSC (i.e.
>  ignoring kvmclock entirely) seems to work rather more reliably and
>  considerably faster than a kvmclock guest.
> >>>
> >>> If all your hosts have a working TSC and you don't do migration or
> >>> save/restore, that's a valid configuration.  It's not a good default,
> >>> however.
> >>
> >> Er?
> >>
> >> kvmclock is still really quite slow and buggy.
> >
> > Unless it takes 3-4000 clock cycles for a gettimeofday, which it
> > shouldn't even with vdso disabled, it's definitely not slower than PIO.
> >
> >> And the patch I identified is definitely a problem here:
> >>
> >> [  136.131241] KVM: disabling fast timing permanently due to inability
> >> to recover from suspend
> >>
> >> I got that on the host with this whitespace-damaged patch:
> >>
> >> if (backwards_tsc) {
> >> u64 delta_cyc = max_tsc - local_tsc;
> >> +   if (!backwards_tsc_observed)
> >> +   pr_warn("KVM: disabling fast timing
> >> permanently due to inability to recover from suspend\n");
> >>
> >> when I suspended and resumed.
> >>
> >> Can anyone explain what problem
> >> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
> >> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
> >> normal TSC logic handle that case right?  After all, all vcpus should
> >> be paused when we resume from suspend.  At worst, we should just need
> >> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
> >> shouldn't we do that regardless of which way the TSC jumped on
> >> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
> >> likely to change except on the very small handful of CPUs (if any)
> >> that keep the TSC running in S3 and hibernate.
> >
> > I don't recall the details of that patch, so Marcelo will have to answer
> > this, or Alex too since he chimed in the original thread.  At least it
> > should be made conditional on the existence of a VM at suspend time (and
> > the master clock stuff should be made per VM, as I suggested at
> > https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html).
> >
> > It would indeed be great if the master clock could be dropped.  But I'm
> > definitely missing some of the subtle details. :(
> 
> Me, too.
> 
> Anyway, see the attached untested patch.  Marcelo?
> 
> --Andy

Read the last email, about the problem.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Marcelo Tosatti
On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
> I'm trying to clean up kvmclock and I can't get it to work at all.  My
> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
> 
> If I boot an SMP (2 vcpus) guest, tracing says:
> 
>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> 
> If I boot a UP guest, tracing says:
> 
>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> I suspect, but I haven't verified, that this is fallout from:
> 
> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
> Author: Marcelo Tosatti 
> Date:   Wed May 14 12:43:24 2014 -0300
> 
> KVM: x86: disable master clock if TSC is reset during suspend
> 
> Updating system_time from the kernel clock once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.
> 
> Disable master clock in case it is necessary to update it
> from the resume path.
> 
> Signed-off-by: Marcelo Tosatti 
> Signed-off-by: Paolo Bonzini 
> 
> 
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.
>
> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> start over.  A correctly functioning KVM guest using TSC (i.e.
> ignoring kvmclock entirely) 
> seems to work rather more reliably and
> considerably faster than a kvmclock guest.
> 
> --Andy
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC

Andy,

I am all for solving practical problems rather than pleasing aesthetic
pleasure. 

> Updating system_time from the kernel clock once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.
> 
> Disable master clock in case it is necessary to update it
> from the resume path.

> once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.

guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads.

If the effective frequency of the kernel clock is lower (for example
due to NTP correcting the TSC frequency of the system), and you resume 
and update the system, the following happens:

guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
reads=LARGE VALUE.
suspend/resume event.
guest visible clock = tsc_timestamp (updated at time N) + scaled tsc reads=0.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/22] arm64: KVM: Implement timer save/restore

2015-12-11 Thread Christoffer Dall
On Mon, Dec 07, 2015 at 10:53:22AM +, Marc Zyngier wrote:
> Implement the timer save restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 02/22] arm64: KVM: Add a HYP-specific header file

2015-12-11 Thread Christoffer Dall
On Mon, Dec 07, 2015 at 10:53:18AM +, Marc Zyngier wrote:
> In order to expose the various EL2 services that are private to
> the hypervisor, add a new hyp.h file.
> 
> So far, it only contains mundane things such as section annotation
> and VA manipulation.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-11 Thread Christoffer Dall
On Mon, Dec 07, 2015 at 10:53:21AM +, Marc Zyngier wrote:
> Implement the vgic-v3 save restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |   1 +
>  arch/arm64/kvm/hyp/hyp.h|   3 +
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
> 
>  3 files changed, 230 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index d8d5968..d1e38ce 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -3,3 +3,4 @@
>  #
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index ac63553..5759f9f 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -32,5 +32,8 @@
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> new file mode 100644
> index 000..78d05f3
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -0,0 +1,226 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +#define vtr_to_max_lr_idx(v) ((v) & 0xf)
> +#define vtr_to_nr_pri_bits(v)(((u32)(v) >> 29) + 1)
> +
> +#define read_gicreg(r)   
> \
> + ({  \
> + u64 reg;\
> + asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
> + reg;\
> + })
> +
> +#define write_gicreg(v,r)\
> + do {\
> + u64 __val = (v);\
> + asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
> + } while (0)
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
> + u64 val;
> + u32 max_lr_idx, nr_pri_bits;
> +
> + /*
> +  * Make sure stores to the GIC via the memory mapped interface
> +  * are now visible to the system register interface.
> +  */
> + dsb(st);
> +
> + cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> + cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> + cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> + cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
> +
> + write_gicreg(0, ICH_HCR_EL2);
> + val = read_gicreg(ICH_VTR_EL2);
> + max_lr_idx = vtr_to_max_lr_idx(val);
> + nr_pri_bits = vtr_to_nr_pri_bits(val);
> +
> + switch (max_lr_idx) {
> + case 15:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = 
> read_gicreg(ICH_LR15_EL2);
> + case 14:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = 
> read_gicreg(ICH_LR14_EL2);
> + case 13:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = 
> read_gicreg(ICH_LR13_EL2);
> + case 12:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = 
> read_gicreg(ICH_LR12_EL2);
> + case 11:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = 
> read_gicreg(ICH_LR11_EL2);
> + case 10:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = 
> read_gicreg(ICH_LR10_EL2);
> + case 9:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
> + case 8:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
> + case 7:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2);
> + case 6:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2);
> + case 5:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2

Re: [PATCH v3 04/22] KVM: arm/arm64: vgic-v3: Make the LR indexing macro public

2015-12-11 Thread Christoffer Dall
On Mon, Dec 07, 2015 at 10:53:20AM +, Marc Zyngier wrote:
> We store GICv3 LRs in reverse order so that the CPU can save/restore
> them in rever order as well (don't ask why, the design is crazy),

s/rever/reverse/

> and yet generate memory traffic that doesn't completely suck.
> 
> We need this macro to be available to the C version of save/restore.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  include/kvm/arm_vgic.h |  6 ++
>  virt/kvm/arm/vgic-v3.c | 10 ++
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index d2f4147..13a3d53 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -279,6 +279,12 @@ struct vgic_v2_cpu_if {
>   u32 vgic_lr[VGIC_V2_MAX_LRS];
>  };
>  
> +/*
> + * LRs are stored in reverse order in memory. make sure we index them

s/make/Make/  or s/\./,

> + * correctly.
> + */
> +#define VGIC_V3_LR_INDEX(lr) (VGIC_V3_MAX_LRS - 1 - lr)
> +
>  struct vgic_v3_cpu_if {
>  #ifdef CONFIG_KVM_ARM_VGIC_V3
>   u32 vgic_hcr;
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index 487d635..3813d23 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -36,18 +36,12 @@
>  #define GICH_LR_PHYSID_CPUID (7UL << GICH_LR_PHYSID_CPUID_SHIFT)
>  #define ICH_LR_VIRTUALID_MASK(BIT_ULL(32) - 1)
>  
> -/*
> - * LRs are stored in reverse order in memory. make sure we index them
> - * correctly.
> - */
> -#define LR_INDEX(lr) (VGIC_V3_MAX_LRS - 1 - lr)
> -
>  static u32 ich_vtr_el2;
>  
>  static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  {
>   struct vgic_lr lr_desc;
> - u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)];
> + u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)];
>  
>   if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>   lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
> @@ -111,7 +105,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>   lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
>   }
>  
> - vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
> + vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)] = lr_val;
>  
>   if (!(lr_desc.state & LR_STATE_MASK))
>   vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr);
> -- 
> 2.1.4
> 

Otherwise:
Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 03/22] arm64: KVM: Implement vgic-v2 save/restore

2015-12-11 Thread Christoffer Dall
On Mon, Dec 07, 2015 at 10:53:19AM +, Marc Zyngier wrote:
> Implement the vgic-v2 save restore (mostly) as a direct translation
> of the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/Makefile |  1 +
>  arch/arm64/kvm/hyp/Makefile |  5 +++
>  arch/arm64/kvm/hyp/hyp.h|  3 ++
>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 84 
> +
>  4 files changed, 93 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/Makefile
>  create mode 100644 arch/arm64/kvm/hyp/vgic-v2-sr.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 1949fe5..d31e4e5 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -10,6 +10,7 @@ KVM=../../../virt/kvm
>  ARM=../../../arch/arm/kvm
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
> +obj-$(CONFIG_KVM_ARM_HOST) += hyp/
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
> $(KVM)/eventfd.o $(KVM)/vfio.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> new file mode 100644
> index 000..d8d5968
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for Kernel-based Virtual Machine module, HYP part
> +#
> +
> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 057f483..ac63553 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -29,5 +29,8 @@
>  #define hyp_kern_va(v) (typeof(v))((unsigned long)(v) - HYP_PAGE_OFFSET \
> + PAGE_OFFSET)
>  
> +void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> +void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> new file mode 100644
> index 000..e717612
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> @@ -0,0 +1,84 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
> + struct vgic_dist *vgic = &kvm->arch.vgic;
> + void __iomem *base = kern_hyp_va(vgic->vctrl_base);
> + u32 eisr0, eisr1, elrsr0, elrsr1;
> + int i, nr_lr;
> +
> + if (!base)
> + return;
> +
> + nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> + cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
> + cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> + eisr0  = readl_relaxed(base + GICH_EISR0);
> + elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> + if (unlikely(nr_lr > 32)) {
> + eisr1  = readl_relaxed(base + GICH_EISR1);
> + elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> + } else {
> + eisr1 = elrsr1 = 0;
> + }
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> + cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> + cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> +#else
> + cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> + cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> +#endif
> + cpu_if->vgic_apr= readl_relaxed(base + GICH_APR);
> +
> + writel_relaxed(0, base + GICH_HCR);
> +
> + for (i = 0; i < nr_lr; i++)
> + cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> +}
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
> + struct vgic_dist *vgic = &kvm->arch.vgic;
> + void __iomem *base = kern_hyp_va(vgic->vctrl_base);
> + int i, nr_lr;
> +
> + if (!base)
> + return;
> +
> + writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> + writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> + writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> +
> + nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> + for (i = 0; i < nr_lr; i

Re: [PATCH v3 07/22] arm64: KVM: Implement system register save/restore

2015-12-11 Thread Marc Zyngier
Hi Mario,

On 11/12/15 03:24, Mario Smarduch wrote:
> Hi Marc,
> 
> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>> Implement the system register save/restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> Reviewed-by: Christoffer Dall 
>> ---
>>  arch/arm64/kvm/hyp/Makefile|  1 +
>>  arch/arm64/kvm/hyp/hyp.h   |  3 ++
>>  arch/arm64/kvm/hyp/sysreg-sr.c | 90 
>> ++
>>  3 files changed, 94 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index 455dc0a..ec94200 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -5,3 +5,4 @@
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index f213e46..778d56d 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -38,5 +38,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>  void __timer_save_state(struct kvm_vcpu *vcpu);
>>  void __timer_restore_state(struct kvm_vcpu *vcpu);
>>  
>> +void __sysreg_save_state(struct kvm_cpu_context *ctxt);
>> +void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
>> +
>>  #endif /* __ARM64_KVM_HYP_H__ */
>>  
>> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
>> new file mode 100644
>> index 000..add8fcb
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
>> @@ -0,0 +1,90 @@
>> +/*
>> + * Copyright (C) 2012-2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
> 
> I looked closer on some other ways to get better performance out of
> the compiler. This code sequence performs about 35% faster for 
> __sysreg_save_state(..) for 5000 exits you save about 500mS or 100nS
> per exit. This is on Juno.

35% faster? Really? That's pretty crazy. Was that on the A57 or the A53?

> 
> register int volatile count asm("r2") = 0;

Does this even work on arm64? We don't have an "r2" register...

> 
> do {
> 
> } while(count);
> 
> I didn't test the restore function (ran out of time) but I suspect it should 
> be
> the same. The assembler pretty much uses all the GPRs, (a little too many, 
> using
> stp to push 4 pairs on the stack and restore) looking at the assembler it all
> should execute out of order.

Are you talking about the original implementation here? or the generated
code out of the compiler? The original implementation didn't push
anything on the stack (apart from the prologue, but we have the same
thing in the C implementation).

Looking at the compiler output, we have a bunch of mrs/str, one after
the other - pretty basic. Maybe that gives the CPU some "breathing"
time, but I have no idea if that's more or less efficient.

But the main thing is that we can now rely on the compiler to generate
something that is more or less optimized for a given platform if there
is such a requirement. We go from something that was cast in stone to
something that has some degree of flexibility.

> 
> FWIW I gave this a try since compilers like to optimize loops. I used
> 'cntpct_el0' counter register to measure the intervals.

It'd be nice to have a measure in terms of cycle, but that's a good
first approximation.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] x86: always inline functions called after set_exception_return

2015-12-11 Thread David Matlack
On Wed, Dec 9, 2015 at 7:02 AM, Paolo Bonzini  wrote:
> On 07/12/2015 21:36, David Matlack wrote:
>> set_exception_return forces exceptions handlers to return to a specific
>> address instead of returning to the instruction address pushed by the
>> CPU at the time of the exception. The unit tests apic.c and vmx.c use
>> this functionality to recover from expected exceptions.
>>
>> When using set_exception_return we have to be careful not to modify the
>> stack (such as by doing a function call) as triggering the exception will
>> likely jump us past the instructions which undo the stack manipulation
>> (such as a ret). To accomplish this, declare all functions called after
>> set_exception_return as __always_inline, so that the compiler always
>> inlines them.
>
> set_exception_return is generally not a great idea IMHO---thanks for
> looking at it.

Yup. This is a band-aid just to fix the current implementation.

>
> A couple years ago we discussed adding setjmp/longjmp to libcflat
> (http://www.spinics.net/lists/kvm/msg94159.html which is however missing
> a 32-bit version).  Making the exceptions do a longjmp would be a much
> safer option.

Good idea! I might give this a try, but don't hold your breath :)

>
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-12-11 Thread Andy Lutomirski
On Fri, Dec 11, 2015 at 12:42 AM, Paolo Bonzini  wrote:
>
>
> On 11/12/2015 08:52, Ingo Molnar wrote:
>>
>> * Paolo Bonzini  wrote:

>>>
>>> Reviewed-by: Paolo Bonzini 
>>
>> Thanks. I've added your Reviewed-by to the 1/5 patch as well - to be able to 
>> put
>> the whole series into the tip:x86/entry tree. Let me know if you'd like it 
>> to be
>> done differently.
>
> The 1/5 patch is entirely in KVM and is not necessary for the rest of
> the series to work.  I would like it to be separate, because Marcelo has
> not yet chimed in to say why it was necessary.
>
> Can you just apply patches 2-5?

Yes, please.  I don't grok the clock update mechanism in the KVM host
well enough to be sure that patch 1 is actually correct.  All I know
is that it works better on my laptop with the patch than without the
patch and that it seems at least conceptually correct.

In any event, patch 1 is a host patch and 2-5 are guest patches, and
they only interact to the extent that it's hard for me to test 2-5 on
the guest without patch 1 on the host because without patch 1 my
laptop's host kernel tends to disable stable kvmclock, thus disabling
the entire mechanism in the guest.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()

2015-12-11 Thread Andy Lutomirski
On Fri, Dec 11, 2015 at 12:06 AM, Ingo Molnar  wrote:
>
> * Andy Lutomirski  wrote:
>
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index f80d70009ff8..6d7d0e52ed5a 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -19,7 +19,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #ifdef CONFIG_X86_32
>>  #include 
>>  #include 
>
> So this change triggered a build failure on 64-bit allmodconfig - fixed via 
> the
> patch below. Your change unearthed a latent bug, a missing header inclusion.
>
> Thanks,
>
> Ingo
>
> >
> From d51953b0873358d13b189996e6976dfa12a9b59d Mon Sep 17 00:00:00 2001
> From: Ingo Molnar 
> Date: Fri, 11 Dec 2015 09:01:30 +0100
> Subject: [PATCH] x86/platform/uv: Include clocksource.h for 
> clocksource_touch_watchdog()
>
> This build failure triggers on 64-bit allmodconfig:
>
>   arch/x86/platform/uv/uv_nmi.c:493:2: error: implicit declaration of 
> function ‘clocksource_touch_watchdog’ [-Werror=implicit-function-declaration]
>
> which is caused by recent changes exposing a missing clocksource.h include
> in uv_nmi.c:
>
>   cc1e24fdb064 x86/vdso: Remove pvclock fixmap machinery
>
> this file got clocksource.h indirectly via fixmap.h - that stealth route
> of header inclusion is now gone.
>
> Cc: Borislav Petkov 
> Cc: H. Peter Anvin 
> Cc: Linus Torvalds 
> Cc: Thomas Gleixner 
> Signed-off-by: Ingo Molnar 

LGTM.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] VSOCK: Add Makefile and Kconfig

2015-12-11 Thread Alex Bennée

Stefan Hajnoczi  writes:

> From: Asias He 
>
> Enable virtio-vsock and vhost-vsock.
>
> Signed-off-by: Asias He 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v3:
>  * Don't put vhost vsock driver into staging
>  * Add missing Kconfig dependencies (Arnd Bergmann )
> ---
>  drivers/vhost/Kconfig  | 10 ++
>  drivers/vhost/Makefile |  4 
>  net/vmw_vsock/Kconfig  | 18 ++
>  net/vmw_vsock/Makefile |  2 ++
>  4 files changed, 34 insertions(+)
>
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index 533eaf0..a1bb4c2 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -21,6 +21,16 @@ config VHOST_SCSI
>   Say M here to enable the vhost_scsi TCM fabric module
>   for use with virtio-scsi guests
>
> +config VHOST_VSOCK
> + tristate "vhost virtio-vsock driver"
> + depends on VSOCKETS && EVENTFD
> + select VIRTIO_VSOCKETS_COMMON
> + select VHOST
> + select VHOST_RING
> + default n
> + ---help---
> + Say M here to enable the vhost-vsock for virtio-vsock guests

I think checkpatch prefers a few more words for the feature but I'm
happy with it.

> +
>  config VHOST_RING
>   tristate
>   ---help---
> diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> index e0441c3..6b012b9 100644
> --- a/drivers/vhost/Makefile
> +++ b/drivers/vhost/Makefile
> @@ -4,5 +4,9 @@ vhost_net-y := net.o
>  obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
>  vhost_scsi-y := scsi.o
>
> +obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
> +vhost_vsock-y := vsock.o
> +
>  obj-$(CONFIG_VHOST_RING) += vringh.o
> +
>  obj-$(CONFIG_VHOST)  += vhost.o
> diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
> index 14810ab..74e0bc8 100644
> --- a/net/vmw_vsock/Kconfig
> +++ b/net/vmw_vsock/Kconfig
> @@ -26,3 +26,21 @@ config VMWARE_VMCI_VSOCKETS
>
> To compile this driver as a module, choose M here: the module
> will be called vmw_vsock_vmci_transport. If unsure, say N.
> +
> +config VIRTIO_VSOCKETS
> + tristate "virtio transport for Virtual Sockets"
> + depends on VSOCKETS && VIRTIO
> + select VIRTIO_VSOCKETS_COMMON
> + help
> +   This module implements a virtio transport for Virtual Sockets.
> +
> +   Enable this transport if your Virtual Machine runs on
> Qemu/KVM.

Is this better worded as:

"Enable this transport if your Virtual Machine host supports vsockets
over virtio."

?

Otherwise:

Reviewed-by: Alex Bennée 

> +
> +   To compile this driver as a module, choose M here: the module
> +   will be called virtio_vsock_transport. If unsure, say N.
> +
> +config VIRTIO_VSOCKETS_COMMON
> +   tristate
> +   ---help---
> + This option is selected by any driver which needs to access
> + the virtio_vsock.
> diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
> index 2ce52d7..cf4c294 100644
> --- a/net/vmw_vsock/Makefile
> +++ b/net/vmw_vsock/Makefile
> @@ -1,5 +1,7 @@
>  obj-$(CONFIG_VSOCKETS) += vsock.o
>  obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
> +obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
> +obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
>
>  vsock-y += af_vsock.o vsock_addr.o
>
> --
> 2.5.0


--
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH backport v3.12..v3.14 2/4] MIPS: KVM: Fix ASID restoration logic

2015-12-11 Thread James Hogan
commit 002374f371bd02df864cce1fe85d90dc5b292837 upstream.

ASID restoration on guest resume should determine the guest execution
mode based on the guest Status register rather than bit 30 of the guest
PC.

Fix the two places in locore.S that do this, loading the guest status
from the cop0 area. Note, this assembly is specific to the trap &
emulate implementation of KVM, so it doesn't need to check the
supervisor bit as that mode is not implemented in the guest.

Fixes: b680f70fc111 ("KVM/MIPS32: Entry point for trampolining to...")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: James Hogan 
---
 arch/mips/kvm/kvm_locore.S | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/mips/kvm/kvm_locore.S b/arch/mips/kvm/kvm_locore.S
index 03a2db58b22d..ba5ce99c021d 100644
--- a/arch/mips/kvm/kvm_locore.S
+++ b/arch/mips/kvm/kvm_locore.S
@@ -159,9 +159,11 @@ FEXPORT(__kvm_mips_vcpu_run)
 
 FEXPORT(__kvm_mips_load_asid)
/* Set the ASID for the Guest Kernel */
-   INT_SLL t0, t0, 1   /* with kseg0 @ 0x4000, kernel */
-   /* addresses shift to 0x8000 */
-   bltzt0, 1f  /* If kernel */
+   PTR_L   t0, VCPU_COP0(k1)
+   LONG_L  t0, COP0_STATUS(t0)
+   andit0, KSU_USER | ST0_ERL | ST0_EXL
+   xorit0, KSU_USER
+   bnezt0, 1f  /* If kernel */
 INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID/* else user */
 1:
@@ -438,9 +440,11 @@ __kvm_mips_return_to_guest:
mtc0t0, CP0_EPC
 
/* Set the ASID for the Guest Kernel */
-   INT_SLL t0, t0, 1   /* with kseg0 @ 0x4000, kernel */
-   /* addresses shift to 0x8000 */
-   bltzt0, 1f  /* If kernel */
+   PTR_L   t0, VCPU_COP0(k1)
+   LONG_L  t0, COP0_STATUS(t0)
+   andit0, KSU_USER | ST0_ERL | ST0_EXL
+   xorit0, KSU_USER
+   bnezt0, 1f  /* If kernel */
 INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID/* else user */
 1:
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH backport v3.10..v3.14 3/4] MIPS: KVM: Fix CACHE immediate offset sign extension

2015-12-11 Thread James Hogan
commit c5c2a3b998f1ff5a586f9d37e154070b8d550d17 upstream.

The immediate field of the CACHE instruction is signed, so ensure that
it gets sign extended by casting it to an int16_t rather than just
masking the low 16 bits.

Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch 
emulation.")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: James Hogan 
---
 arch/mips/kvm/kvm_mips_emul.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c
index c76f297b7149..33085819cd89 100644
--- a/arch/mips/kvm/kvm_mips_emul.c
+++ b/arch/mips/kvm/kvm_mips_emul.c
@@ -935,7 +935,7 @@ kvm_mips_emulate_cache(uint32_t inst, uint32_t *opc, 
uint32_t cause,
 
base = (inst >> 21) & 0x1f;
op_inst = (inst >> 16) & 0x1f;
-   offset = inst & 0x;
+   offset = (int16_t)inst;
cache = (inst >> 16) & 0x3;
op = (inst >> 18) & 0x7;
 
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH backport v3.10..v3.14 4/4] MIPS: KVM: Uninit VCPU in vcpu_create error path

2015-12-11 Thread James Hogan
commit 585bb8f9a5e592f2ce7abbe5ed3112d5438d2754 upstream.

If either of the memory allocations in kvm_arch_vcpu_create() fail, the
vcpu which has been allocated and kvm_vcpu_init'd doesn't get uninit'd
in the error handling path. Add a call to kvm_vcpu_uninit() to fix this.

Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: James Hogan 
---
 arch/mips/kvm/kvm_mips.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index 2cb24788a8a6..7e7de1f2b8ed 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -312,7 +312,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
 
if (!gebase) {
err = -ENOMEM;
-   goto out_free_cpu;
+   goto out_uninit_cpu;
}
kvm_info("Allocated %d bytes for KVM Exception Handlers @ %p\n",
 ALIGN(size, PAGE_SIZE), gebase);
@@ -372,6 +372,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
 out_free_gebase:
kfree(gebase);
 
+out_uninit_cpu:
+   kvm_vcpu_uninit(vcpu);
+
 out_free_cpu:
kfree(vcpu);
 
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH backport v3.10 1/4] MIPS: KVM: Fix ASID restoration logic

2015-12-11 Thread James Hogan
commit 002374f371bd02df864cce1fe85d90dc5b292837 upstream.

ASID restoration on guest resume should determine the guest execution
mode based on the guest Status register rather than bit 30 of the guest
PC.

Fix the two places in locore.S that do this, loading the guest status
from the cop0 area. Note, this assembly is specific to the trap &
emulate implementation of KVM, so it doesn't need to check the
supervisor bit as that mode is not implemented in the guest.

Fixes: b680f70fc111 ("KVM/MIPS32: Entry point for trampolining to...")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: James Hogan 
---
 arch/mips/kvm/kvm_locore.S | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/mips/kvm/kvm_locore.S b/arch/mips/kvm/kvm_locore.S
index 920b63210806..34c35f0e3290 100644
--- a/arch/mips/kvm/kvm_locore.S
+++ b/arch/mips/kvm/kvm_locore.S
@@ -156,9 +156,11 @@ FEXPORT(__kvm_mips_vcpu_run)
 
 FEXPORT(__kvm_mips_load_asid)
 /* Set the ASID for the Guest Kernel */
-sll t0, t0, 1   /* with kseg0 @ 0x4000, 
kernel */
-/* addresses shift to 
0x8000 */
-bltzt0, 1f  /* If kernel */
+PTR_L  t0, VCPU_COP0(k1)
+LONG_L t0, COP0_STATUS(t0)
+andi   t0, KSU_USER | ST0_ERL | ST0_EXL
+xori   t0, KSU_USER
+bnez   t0, 1f  /* If kernel */
addiu   t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
 addiu   t1, k1, VCPU_GUEST_USER_ASID/* else user */
 1:
@@ -442,9 +444,11 @@ __kvm_mips_return_to_guest:
mtc0t0, CP0_EPC
 
 /* Set the ASID for the Guest Kernel */
-sll t0, t0, 1   /* with kseg0 @ 0x4000, 
kernel */
-/* addresses shift to 
0x8000 */
-bltzt0, 1f  /* If kernel */
+PTR_L  t0, VCPU_COP0(k1)
+LONG_L t0, COP0_STATUS(t0)
+andi   t0, KSU_USER | ST0_ERL | ST0_EXL
+xori   t0, KSU_USER
+bnez   t0, 1f  /* If kernel */
addiu   t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
 addiu   t1, k1, VCPU_GUEST_USER_ASID/* else user */
 1:
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-11 Thread Radim Krcmár
2015-12-10 01:52+, Wu, Feng:
>> From: Radim Krčmář [mailto:rkrc...@redhat.com]
>> (Physical xAPIC+x2APIC mode is still somewhat reasonable and xAPIC CPUs
>>  start with LDR=0, which means that operating system doesn't need to
>>  utilize mixed mode, as defined by KVM, when switching to x2APIC.)
> 
> I think you mean Physical xAPIC+Physical x2APIC mode, right? For physical
> mode, we don't use LDR in any case, do we? So in physical mode, we only
> use the APIC ID, that is why they can be mixed, is my understanding correct?

Yes.  (Technically, physical and logical addressing is always active in
APIC, but xAPIC must have nonzero LDR to accept logical interrupts[1].)
If all xAPIC LDRs are zero, KVM doesn't enter a "mixed mode" even if
some are xAPIC and some x2APIC [2].

1: Real LAPICs probably do not accept broadcasts on APICs where LDR=0,
   KVM LAPICs do, but lowest priority broadcast is not allowed anyway,
   so PI doesn't care.

2: KVM allows OS-writeable APIC ID, which complicates things and real
   hardware probably doesn't allow it because of that ... we'd be saner
   with RO APIC ID, but it's not that bad.  (And no major OS does it :])

>>  the system uses cluster xAPIC, OS should set DFR before LDR, which
>>  doesn't trigger mixed mode either.)
> 
> Just curious, if the APIC is software disabled and it is in xAPIC mode. OS 
> sets
> different value for DFR for different APICs, then when OS sets LDR, KVM can
> trigger mixed flat and cluster mode, right?

Exactly.
APICs with zeroed LDR are ignored, so KVM will use the slow-path for
delivery (= trigger mixed mode) at the moment the first APIC with
different DFR is configured.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] VSOCK: Introduce vhost-vsock.ko

2015-12-11 Thread Alex Bennée

Stefan Hajnoczi  writes:

> From: Asias He 
>
> VM sockets vhost transport implementation. This module runs in host
> kernel.

As per previous checkpatch comments.

>
> Signed-off-by: Asias He 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v3:
>  * Remove unneeded variable used to store return value
>(Fengguang Wu  and Julia Lawall
>)
> v2:
>  * Add missing total_tx_buf decrement
>  * Support flexible rx/tx descriptor layout
>  * Refuse to assign reserved CIDs
>  * Refuse guest CID if already in use
>  * Only accept correctly addressed packets
> ---
>  drivers/vhost/vsock.c | 628 
> ++
>  drivers/vhost/vsock.h |   4 +
>  2 files changed, 632 insertions(+)
>  create mode 100644 drivers/vhost/vsock.c
>  create mode 100644 drivers/vhost/vsock.h
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> new file mode 100644
> index 000..3c0034a
> --- /dev/null
> +++ b/drivers/vhost/vsock.c
> @@ -0,0 +1,628 @@
> +/*
> + * vhost transport for vsock
> + *
> + * Copyright (C) 2013-2015 Red Hat, Inc.
> + * Author: Asias He 
> + * Stefan Hajnoczi 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "vhost.h"
> +#include "vsock.h"
> +
> +#define VHOST_VSOCK_DEFAULT_HOST_CID 2
> +
> +static int vhost_transport_socket_init(struct vsock_sock *vsk,
> +struct vsock_sock *psk);
> +
> +enum {
> + VHOST_VSOCK_FEATURES = VHOST_FEATURES,
> +};
> +
> +/* Used to track all the vhost_vsock instances on the system. */
> +static LIST_HEAD(vhost_vsock_list);
> +static DEFINE_MUTEX(vhost_vsock_mutex);
> +
> +struct vhost_vsock_virtqueue {
> + struct vhost_virtqueue vq;
> +};
> +
> +struct vhost_vsock {
> + /* Vhost device */
> + struct vhost_dev dev;
> + /* Vhost vsock virtqueue*/
> + struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
> + /* Link to global vhost_vsock_list*/
> + struct list_head list;
> + /* Head for pkt from host to guest */
> + struct list_head send_pkt_list;
> + /* Work item to send pkt */
> + struct vhost_work send_pkt_work;
> + /* Wait queue for send pkt */
> + wait_queue_head_t queue_wait;
> + /* Used for global tx buf limitation */
> + u32 total_tx_buf;
> + /* Guest contex id this vhost_vsock instance handles */
> + u32 guest_cid;
> +};

As with 2/4 there is a fair bit of redundancy in the comments but I
don't see any obvious grouping here that could streamline it.

> +
> +static u32 vhost_transport_get_local_cid(void)
> +{
> + return VHOST_VSOCK_DEFAULT_HOST_CID;
> +}
> +
> +static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> +{
> + struct vhost_vsock *vsock;
> +
> + mutex_lock(&vhost_vsock_mutex);
> + list_for_each_entry(vsock, &vhost_vsock_list, list) {
> + if (vsock->guest_cid == guest_cid) {
> + mutex_unlock(&vhost_vsock_mutex);
> + return vsock;
> + }
> + }
> + mutex_unlock(&vhost_vsock_mutex);
> +
> + return NULL;
> +}
> +
> +static void
> +vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> + struct vhost_virtqueue *vq)
> +{
> + bool added = false;
> +
> + mutex_lock(&vq->mutex);
> + vhost_disable_notify(&vsock->dev, vq);
> + for (;;) {
> + struct virtio_vsock_pkt *pkt;
> + struct iov_iter iov_iter;
> + unsigned out, in;
> + struct sock *sk;
> + size_t nbytes;
> + size_t len;
> + int head;
> +
> + if (list_empty(&vsock->send_pkt_list)) {
> + vhost_enable_notify(&vsock->dev, vq);
> + break;
> + }
> +
> + head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> +  &out, &in, NULL, NULL);
> + pr_debug("%s: head = %d\n", __func__, head);
> + if (head < 0)
> + break;
> +
> + if (head == vq->num) {
> + if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> + vhost_disable_notify(&vsock->dev, vq);
> + continue;

Why are we doing this? If we enable something we then disable it? A
comment as to what is going on here would be useful.

> + }
> + break;
> + }
> +
> + pkt = list_first_entry(&vsock->send_pkt_list,
> +struct virtio_vsock_pkt, list);
> + list_del_init(&pkt->list);
> +
> + if (out) {
> + virtio_transport_free_pkt(pkt);
> + vq_err(vq, "Expected 0 output buffers, got %u\n", out);
> + break;
> + }
> +
> + len = iov_length(&

Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-11 Thread Borislav Petkov
On Fri, Dec 11, 2015 at 11:41:30AM +0100, Paolo Bonzini wrote:
> You can disable it (well, make it take a few days to appear) with this:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 484079efea5b..a9070e260c72 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -496,7 +496,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
>* Init kvm generation close to the maximum to easily test the
>* code of handling generation number wrap-around.
>*/
> - slots->generation = -150;
> + slots->generation = 0;
>   for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
>   slots->id_to_index[i] = slots->memslots[i].id = i;
> 
> but it would not be AMD-specific.

Yeah, that didn't help. This time the splat is a bit more interesting:

qemu process segfaulted at a kernel address - 816e2db1 - which
is the last insn of entry_SYSCALL_64_fastpath:

816e2d45 :
816e2d45:   25 ff ff ff bf  and$0xbfff,%eax
816e2d4a:   3d 21 02 00 00  cmp$0x221,%eax

...

816e2d9e:   4c 8b 9c 24 90 00 00mov0x90(%rsp),%r11
816e2da5:   00 
816e2da6:   48 8b a4 24 98 00 00mov0x98(%rsp),%rsp
816e2dad:   00 
816e2dae:   0f 01 f8swapgs
816e2db1:   48 0f 07sysretq

Yap, at SYSRET.

Andy might find this a little amusing :-)

[  459.130565] qemu-system-x86[3724]: segfault at 816e2db1 ip 
816e2db1 sp 7fd593ffe970 error 15
[  512.578297] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  512.586189] IP: [<  (null)>]   (null)
[  512.591266] PGD 0 
[  512.593303] Oops: 0010 [#1] PREEMPT SMP 
[  512.597283] Modules linked in: tun sha256_ssse3 sha256_generic drbg 
binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm irqbypass 
crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
amd64_edac_mod k10temp edac_mce_amd fam15h_power amdkfd amd_iommu_v2 radeon 
acpi_cpufreq
[  512.601698] CPU: 5 PID: 3787 Comm: qemu-system-x86 Not tainted 4.4.0-rc4+ #8
[  512.601699] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[  512.601700] task: 8800b5d04680 ti: 88041104c000 task.ti: 
88041104c000
[  512.601701] RIP: 0010:[<>] 
[  512.601701]  [<  (null)>]   (null)
[  512.601702] RSP: 0018:88041104fcc0  EFLAGS: 00010212
[  512.601703] RAX: 0040 RBX: 8804110b4000 RCX: 
[  512.601703] RDX:  RSI: 8804110b4000 RDI: 88041104fc20
[  512.601704] RBP: 88041104fcc8 R08: 0001 R09: 
[  512.601704] R10: 0001 R11: 0001 R12: 
[  512.601705] R13:  R14: 0001 R15: 0001
[  512.601706] FS:  7fb0deaf4700() GS:88042d00() 
knlGS:
[  512.601706] CS:  0010 DS:  ES:  CR0: 8005003b
[  512.601707] CR2:  CR3: 000428118000 CR4: 000406e0
[  512.601707] Stack:
[  512.601708]  a02b2d3c
[  512.601708]  88041104fdd8
[  512.601709]  a02b2d0e
[  512.601709]  a02b2c9a

[  512.601710]  8804110b4430
[  512.601710]  
[  512.601710]  00010004
[  512.601711]  a02e9bb0

[  512.601711]  00040002
[  512.601711]  
[  512.601712]  
[  512.601712]  

[  512.601713] Call Trace:
[  512.601729]  [] ? kvm_set_irq+0x13c/0x250 [kvm]
[  512.601736]  [] kvm_set_irq+0x10e/0x250 [kvm]
[  512.601744]  [] ? kvm_set_irq+0x9a/0x250 [kvm]
[  512.601756]  [] ? kvm_set_msi_irq+0x1b0/0x1b0 [kvm]
[  512.601767]  [] ? kvm_set_ioapic_irq+0x20/0x20 [kvm]
[  512.601776]  [] kvm_vm_ioctl_irq_line+0x32/0x40 [kvm]
[  512.601783]  [] kvm_vm_ioctl+0x5eb/0x820 [kvm]
[  512.601786]  [] ? rcu_read_lock_held+0x45/0x60
[  512.601788]  [] do_vfs_ioctl+0x2e0/0x540
[  512.601790]  [] ? __fget_light+0x29/0x90
[  512.601791]  [] SyS_ioctl+0x4c/0x90
[  512.601794]  [] entry_SYSCALL_64_fastpath+0x16/0x6f
[  512.601797] Code:  Bad RIP value.
[  512.601798] RIP  [<  (null)>]   (null)
[  512.601798]  RSP 
[  512.601799] CR2: 
[  512.609862] ---[ end trace ae4f00b514141891 ]---

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-11 Thread Paolo Bonzini


On 11/12/2015 12:41, Borislav Petkov wrote:
> On Fri, Dec 11, 2015 at 11:41:30AM +0100, Paolo Bonzini wrote:
>> It would be a kvm hypervisor page, not a kvm guest page, hence unrelated
>> to the zapping thing.
> 
> Ah right, guest pages should be userspace addresses, come to think of
> it.
> 
>> Can you grab the kallsyms before making it crash?
> 
> Attached. It was a different corruption this time, see below. This time
> we don't even have a page table, PGD is 0, rIP is 1. (Fun :-))

Hmm, you had:

- RIP=0 in the original report (start_this_handle)
- RIP=0 in the second (mutex_lock_nested in ext4)
- RIP=1 now

The more interesting one is the other one which doesn't have a small RIP,
because it has RIP that is slightly larger than the stack pointer, meaning
it's likely a frame pointer.  And this means in turn that the call trace
is correct, and the bug might have happened closer to the actual corruption.

[  959.548625] RIP: 0010:[]  [] 
0x8800b9f9bdf0
[  959.556338] RSP: 0018:8800b9f9bde0  EFLAGS: 00010206
[  959.618579] Stack:
[  959.620607]  a02d5e17 8800b7d48000 8800b9f9be08 
a02bdb1f
[  959.628104]   8800b9f9be98 a02bdc7b 
8804242a4400
[  959.635601]  0070 4000 81a3c1e0 
8800b7ca5e00
[  959.643114] Call Trace:
[  959.645599]  [] ? kvm_arch_vcpu_put+0x17/0x40 [kvm]
[  959.652081]  [] ? vcpu_put+0x1f/0x60 [kvm]
[  959.657782]  [] ? kvm_vcpu_ioctl+0x11b/0x6f0 [kvm]
[  959.664169]  [] ? do_vfs_ioctl+0x2e0/0x540
[  959.669855]  [] ? __fget_light+0x29/0x90
[  959.675364]  [] ? SyS_ioctl+0x4c/0x90
[  959.680618]  [] ? entry_SYSCALL_64_fastpath+0x16/0x6f

My wild guess is that RSP is getting corrupted, but I guess I'll have to try
to reproduce to figure out what happens.

The last thing I need from you (hopefully) is a Kconfig.  If you have some
time, it would be great to check if you can reproduce it with an older kernel
version---trying 4.4-rc1 and 4.3 would be great.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Please pull my kvm-ppc-fixes branch

2015-12-11 Thread Paolo Bonzini


On 10/12/2015 04:12, Paul Mackerras wrote:
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git 
> kvm-ppc-fixes

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/11] qapi: Rename qjson.h to qobject-json.h

2015-12-11 Thread Paolo Bonzini


On 11/12/2015 00:53, Eric Blake wrote:
> We have two different JSON visitors in the tree; and having both
> named 'qjson.h' can cause include confusion.  Rename the qapi
> version.
> 
> Kill trailing whitespace in the renamed tests/check-qobject-json.c
> to keep checkpatch.pl happy.
> 
> Signed-off-by: Eric Blake 
> ---
>  MAINTAINERS   |  2 +-
>  balloon.c |  2 +-
>  block.c   |  2 +-
>  block/archipelago.c   |  2 +-
>  block/nbd.c   |  2 +-
>  block/quorum.c|  2 +-
>  blockjob.c|  2 +-
>  hw/core/qdev.c|  2 +-
>  hw/misc/pvpanic.c |  2 +-
>  hw/net/virtio-net.c   |  2 +-
>  include/qapi/qmp/{qjson.h => qobject-json.h}  |  0
>  include/qapi/qmp/types.h  |  2 +-
>  monitor.c |  2 +-
>  qapi/qmp-event.c  |  2 +-
>  qemu-img.c|  2 +-
>  qga/main.c|  2 +-
>  qobject/Makefile.objs |  3 ++-
>  qobject/{qjson.c => qobject-json.c}   |  2 +-
>  target-s390x/kvm.c|  2 +-
>  tests/.gitignore  |  2 +-
>  tests/Makefile|  8 
>  tests/{check-qjson.c => check-qobject-json.c} | 14 +++---
>  tests/libqtest.c  |  2 +-
>  ui/spice-core.c   |  2 +-
>  vl.c  |  2 +-
>  25 files changed, 34 insertions(+), 33 deletions(-)
>  rename include/qapi/qmp/{qjson.h => qobject-json.h} (100%)
>  rename qobject/{qjson.c => qobject-json.c} (99%)
>  rename tests/{check-qjson.c => check-qobject-json.c} (99%)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e8cee1e..c943ff4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1155,7 +1155,7 @@ X: include/qapi/qmp/dispatch.h
>  F: tests/check-qdict.c
>  F: tests/check-qfloat.c
>  F: tests/check-qint.c
> -F: tests/check-qjson.c
> +F: tests/check-qobject-json.c
>  F: tests/check-qlist.c
>  F: tests/check-qstring.c
>  T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
> diff --git a/balloon.c b/balloon.c
> index 0f45d1b..5983b4f 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -31,7 +31,7 @@
>  #include "trace.h"
>  #include "qmp-commands.h"
>  #include "qapi/qmp/qerror.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
> 
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> diff --git a/block.c b/block.c
> index 9971976..e611002 100644
> --- a/block.c
> +++ b/block.c
> @@ -29,7 +29,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/module.h"
>  #include "qapi/qmp/qerror.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/sysemu.h"
>  #include "qemu/notify.h"
> diff --git a/block/archipelago.c b/block/archipelago.c
> index 855655c..80a1bb5 100644
> --- a/block/archipelago.c
> +++ b/block/archipelago.c
> @@ -56,7 +56,7 @@
>  #include "qemu/thread.h"
>  #include "qapi/qmp/qint.h"
>  #include "qapi/qmp/qstring.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "qemu/atomic.h"
> 
>  #include 
> diff --git a/block/nbd.c b/block/nbd.c
> index cd6a587..9009f4f 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -32,7 +32,7 @@
>  #include "qemu/module.h"
>  #include "qemu/sockets.h"
>  #include "qapi/qmp/qdict.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "qapi/qmp/qint.h"
>  #include "qapi/qmp/qstring.h"
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index d162459..1e3a278 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -18,7 +18,7 @@
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qapi/qmp/qint.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "qapi/qmp/qlist.h"
>  #include "qapi/qmp/qstring.h"
>  #include "qapi-event.h"
> diff --git a/blockjob.c b/blockjob.c
> index 80adb9d..84361f7 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -31,7 +31,7 @@
>  #include "block/block_int.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/qmp/qerror.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "qemu/coroutine.h"
>  #include "qmp-commands.h"
>  #include "qemu/timer.h"
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index b3ad467..a98304d 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -31,7 +31,7 @@
>  #include "qapi/error.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qapi/visitor.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "qemu/error-

Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-11 Thread Paolo Bonzini


On 11/12/2015 11:22, Borislav Petkov wrote:
> On Thu, Dec 10, 2015 at 07:15:19PM +0100, Paolo Bonzini wrote:
>> Yeah, wait_lapic_expire also have to be moved before __kvm_guest_enter.
> 
> Yeah, v2 doesn't splat on the Intel box anymore but the AMD box still
> has, and it is a different problem. With the v2 applied, it still
> explodes, see below.

Yes, I didn't expect it to fix anything.  I just wanted to pinpoint it
to kvm-amd.

> And I'm willing to bet good money on that shadow pages fun.

You can disable it (well, make it take a few days to appear) with this:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 484079efea5b..a9070e260c72 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -496,7 +496,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 * Init kvm generation close to the maximum to easily test the
 * code of handling generation number wrap-around.
 */
-   slots->generation = -150;
+   slots->generation = 0;
for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
slots->id_to_index[i] = slots->memslots[i].id = i;

but it would not be AMD-specific.

Anyway if this theory is true:

> [  959.466549] kernel tried to execute NX-protected page - exploit attempt? 
> (uid: 1000)
> 
> line basically says that we're pagefaulting when trying to fetch
> instructions, i.e., we're trying to execute something from a page, rIP
> points to 0x8800b9f9bdf0 and that is most likely a page belonging to
> kvm, which, however, is for some reason not executable (anymore?).

It would be a kvm hypervisor page, not a kvm guest page, hence unrelated
to the zapping thing.

Can you grab the kallsyms before making it crash?  I will get to it next
week.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-11 Thread Borislav Petkov
On Thu, Dec 10, 2015 at 07:15:19PM +0100, Paolo Bonzini wrote:
> Yeah, wait_lapic_expire also have to be moved before __kvm_guest_enter.

Yeah, v2 doesn't splat on the Intel box anymore but the AMD box still
has, and it is a different problem. With the v2 applied, it still
explodes, see below.

And I'm willing to bet good money on that shadow pages fun. The

[  959.466549] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 1000)

line basically says that we're pagefaulting when trying to fetch
instructions, i.e., we're trying to execute something from a page, rIP
points to 0x8800b9f9bdf0 and that is most likely a page belonging to
kvm, which, however, is for some reason not executable (anymore?).

Could it have anything to do with that zapping of shadow pages, per
chance?

Can I disable the zapping and see if it still triggers? Or should I try
modprobing kvm with "npt=0" or so?

/me goes and tries it...

Nope, that doesn't help - it still splats.

Hmmm...

[  849.272337] kvm: zapping shadow pages for mmio generation wraparound
[  933.813871] kvm: zapping shadow pages for mmio generation wraparound
[  959.466549] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 1000)
[  959.474369] BUG: unable to handle kernel paging request at 8800b9f9bdf0
[  959.481407] IP: [] 0x8800b9f9bdf0
[  959.486677] PGD 2d7e067 PUD 43efff067 PMD 8000b9e001e3 
[  959.492338] Oops: 0011 [#1] PREEMPT SMP 
[  959.496340] Modules linked in: tun sha256_ssse3 sha256_generic drbg 
binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm irqbypass 
crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
amd64_edac_mod fam15h_power k10temp edac_mce_amd amdkfd amd_iommu_v2 radeon 
acpi_cpufreq
[  959.524023] CPU: 3 PID: 3798 Comm: qemu-system-x86 Not tainted 4.4.0-rc4+ #8
[  959.531127] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[  959.541113] task: 8800b7ca5e00 ti: 8800b9f98000 task.ti: 
8800b9f98000
[  959.548625] RIP: 0010:[]  [] 
0x8800b9f9bdf0
[  959.556338] RSP: 0018:8800b9f9bde0  EFLAGS: 00010206
[  959.561676] RAX: 03993d0f82ee RBX: 8800b7d48000 RCX: 0001
[  959.568844] RDX: 0399 RSI: a02bdc7b RDI: 8800b7d48000
[  959.576010] RBP:  R08: 0001 R09: 
[  959.583177] R10:  R11: 0001 R12: 
[  959.590346] R13: 8800b7d48000 R14:  R15: 
[  959.597513] FS:  7f7fae580700() GS:88042cc0() 
knlGS:
[  959.605643] CS:  0010 DS:  ES:  CR0: 8005003b
[  959.611414] CR2: 8800b9f9bdf0 CR3: 00041b5fe000 CR4: 000406e0
[  959.618579] Stack:
[  959.620607]  a02d5e17 8800b7d48000 8800b9f9be08 
a02bdb1f
[  959.628104]   8800b9f9be98 a02bdc7b 
8804242a4400
[  959.635601]  0070 4000 81a3c1e0 
8800b7ca5e00
[  959.643114] Call Trace:
[  959.645599]  [] ? kvm_arch_vcpu_put+0x17/0x40 [kvm]
[  959.652081]  [] ? vcpu_put+0x1f/0x60 [kvm]
[  959.657782]  [] ? kvm_vcpu_ioctl+0x11b/0x6f0 [kvm]
[  959.664169]  [] ? do_vfs_ioctl+0x2e0/0x540
[  959.669855]  [] ? __fget_light+0x29/0x90
[  959.675364]  [] ? SyS_ioctl+0x4c/0x90
[  959.680618]  [] ? entry_SYSCALL_64_fastpath+0x16/0x6f
[  959.687263] Code: 00 00 00 06 02 01 00 00 00 00 00 e0 bd f9 b9 00 88 ff ff 
18 00 00 00 00 00 00 00 17 5e 2d a0 ff ff ff ff 00 80 d4 b7 00 88 ff ff <08> be 
f9 b9 00 88 ff ff 1f db 2b a0 ff ff ff ff 00 00 00 00 00 
[  959.707506] RIP  [] 0x8800b9f9bdf0
[  959.712862]  RSP 
[  959.716373] CR2: 8800b9f9bdf0
[  959.735764] ---[ end trace 6826bd13f6e235cd ]---
[  959.740465] note: qemu-system-x86[3798] exited with preempt_count 1
[  979.163010] kvm: zapping shadow pages for mmio generation wraparound

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check

2015-12-11 Thread Marc Zyngier
On 10/12/15 21:46, Christoffer Dall wrote:
> External inputs to the vgic from time to time need to poke into the
> state of a virtual interrupt, the prime example is the architected timer
> code.
> 
> Since the IRQ's active state can be represented in two places; the LR or
> the distributor, we first loop over the LRs but if not active in the LRs
> we just return if *any* IRQ is active on the VCPU in question.
> 
> This is of course bogus, as we should check if the specific IRQ in
> quesiton is active on the distributor instead.
> 
> Reported-by: Eric Auger 
> Signed-off-by: Christoffer Dall 
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 65461f8..7a2f449 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1114,7 +1114,7 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, 
> struct irq_phys_map *map)
>   return true;
>   }
>  
> - return dist_active_irq(vcpu);
> + return vgic_irq_is_active(vcpu, map->virt_irq);
>  }
>  
>  /*
> 

Damn!

Acked-by: Marc Zyngier 

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-11 Thread Yongji Xie
Current vfio-pci implementation disallows to mmap MSI-X table in
case that user get to touch this directly.

However, EEH mechanism could ensure that a given pci device
can only shoot the MSIs assigned for its PE and guest kernel also
would not write to MSI-X table in pci_enable_msix() because
para-virtualization on PPC64 platform. So MSI-X table is safe to
access directly from the guest with EEH mechanism enabled.

This patch adds support for this case and allow to mmap MSI-X
table if EEH is supported on PPC64 platform.

And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify
userspace that it's safe to mmap MSI-X table.

Signed-off-by: Yongji Xie 
---
 drivers/vfio/pci/vfio_pci.c |5 -
 drivers/vfio/pci/vfio_pci_private.h |5 +
 include/uapi/linux/vfio.h   |2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index dbcad99..85d9980 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data,
if (vfio_pci_bar_page_aligned())
info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED;
 
+   if (vfio_msix_table_mmap_enabled())
+   info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP;
+
info.num_regions = VFIO_PCI_NUM_REGIONS;
info.num_irqs = VFIO_PCI_NUM_IRQS;
 
@@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
return -EINVAL;
 
-   if (index == vdev->msix_bar) {
+   if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 319352a..835619e 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void)
return IS_ENABLED(CONFIG_PPC64);
 }
 
+static inline bool vfio_msix_table_mmap_enabled(void)
+{
+   return IS_ENABLED(CONFIG_EEH);
+}
+
 extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
 extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1fc8066..289e662 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -173,6 +173,8 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
 /* Platform support all PCI MMIO BARs to be page aligned */
 #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4)
+/* Platform support mmapping PCI MSI-X vector table */
+#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP(1 << 5)
__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
 };
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/3] vfio-pci: Allow to mmap sub-page MMIO BARs if all MMIO BARs are page aligned

2015-12-11 Thread Yongji Xie
Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio page
may be shared with other BARs.

But we should allow to mmap these sub-page MMIO BARs if all MMIO BARs
are page aligned which leads the BARs' mmio page would not be shared
with other BARs.

This patch adds support for this case and we also add a
VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED flag to notify userspace that
platform supports all MMIO BARs to be page aligned.

Signed-off-by: Yongji Xie 
---
 drivers/vfio/pci/vfio_pci.c |   10 +-
 drivers/vfio/pci/vfio_pci_private.h |5 +
 include/uapi/linux/vfio.h   |2 ++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 32b88bd..dbcad99 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -443,6 +443,9 @@ static long vfio_pci_ioctl(void *device_data,
if (vdev->reset_works)
info.flags |= VFIO_DEVICE_FLAGS_RESET;
 
+   if (vfio_pci_bar_page_aligned())
+   info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED;
+
info.num_regions = VFIO_PCI_NUM_REGIONS;
info.num_irqs = VFIO_PCI_NUM_IRQS;
 
@@ -479,7 +482,8 @@ static long vfio_pci_ioctl(void *device_data,
 VFIO_REGION_INFO_FLAG_WRITE;
if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) &&
pci_resource_flags(pdev, info.index) &
-   IORESOURCE_MEM && info.size >= PAGE_SIZE)
+   IORESOURCE_MEM && (info.size >= PAGE_SIZE ||
+   vfio_pci_bar_page_aligned()))
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
break;
case VFIO_PCI_ROM_REGION_INDEX:
@@ -855,6 +859,10 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
return -EINVAL;
 
phys_len = pci_resource_len(pdev, index);
+
+   if (vfio_pci_bar_page_aligned())
+   phys_len = PAGE_ALIGN(phys_len);
+
req_len = vma->vm_end - vma->vm_start;
pgoff = vma->vm_pgoff &
((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 0e7394f..319352a 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -69,6 +69,11 @@ struct vfio_pci_device {
 #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
 #define irq_is(vdev, type) (vdev->irq_type == type)
 
+static inline bool vfio_pci_bar_page_aligned(void)
+{
+   return IS_ENABLED(CONFIG_PPC64);
+}
+
 extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
 extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 751b69f..1fc8066 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -171,6 +171,8 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_PCI  (1 << 1)/* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)/* vfio-platform device */
 #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
+/* Platform support all PCI MMIO BARs to be page aligned */
+#define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4)
__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
 };
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/3] powerpc/pci: Enforce all MMIO BARs to be page aligned

2015-12-11 Thread Yongji Xie
PAGE_SIZE is 64KB by default on PPC64 platform. When vfio
passthrough a pci device of which MMIO BARs are smaller than
64KB(PAGE_SIZE), guest would not handle the mmio accesses
to the BARs which leads to mmio emulations in host.

This is because vfio would not allow to passthrough one
BAR's mmio page which may be shared with other BARs.

To solve this performance issue, this patch enforces the
alignment of all MMIO BARs allocations to be at least PAGE_SIZE
on PPC64 platform because we have enough address space, so
that one BAR's mmio page would not be shared with other BARs.

Signed-off-by: Yongji Xie 
---
 arch/powerpc/kernel/pci-common.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..6989e0f 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1074,6 +1074,11 @@ static int skip_isa_ioresource_align(struct pci_dev *dev)
  * bits, so it's ok to allocate at, say, 0x2800-0x28ff,
  * but we want to try to avoid allocating at 0x2900-0x2bff
  * which might have be mirrored at 0x0100-0x03ff..
+ *
+ * And for PPC64, we enforce the alignment of all MMIO BARs
+ * allocations to be at least PAGE_SIZE(64KB). This would be
+ * helpful to improve performance when we passthrough
+ * a PCI device of which BARs are smaller than PAGE_SIZE
  */
 resource_size_t pcibios_align_resource(void *data, const struct resource *res,
resource_size_t size, resource_size_t align)
@@ -1087,7 +1092,10 @@ resource_size_t pcibios_align_resource(void *data, const 
struct resource *res,
if (start & 0x300)
start = (start + 0x3ff) & ~0x3ff;
}
-
+#ifdef CONFIG_PPC64
+   if (res->flags & IORESOURCE_MEM)
+   start = PAGE_ALIGN(start);
+#endif
return start;
 }
 EXPORT_SYMBOL(pcibios_align_resource);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/3] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform

2015-12-11 Thread Yongji Xie
Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs and MSI-X table. This is because
sub-page BARs' mmio page may be shared with other BARs and MSI-X table
should not be accessed directly from the guest for security reasons.

But these would cause some performance issues for mmio accesses in guest
when vfio passthrough sub-page BARs or BARs containing MSI-X table on
PPC64 platform. This is because PAGE_SIZE is 64KB by default on PPC64
platform and the big page may easily hit the sub-page MMIO
BARs' unmmapping and cause the unmmaping of the mmio page which
MSI-X table locate in, which lead to mmio emulation in host.

For sub-page MMIO BARs' unmmapping, this patch set enforces all MMIO
BARs to be page aligned on PPC64 platform so that sub-page BAR's mmio
page would not be shared with other BARs. Then we can mmap sub-page
MMIO BARs in vfio-pci driver if all MMIO BARs are page aligned.

For MSI-X table's unmmapping, we think MSI-X table is safe to access
directly from the guest with EEH mechanism enabled which can ensure that
a given pci device can only shoot the MSIs assigned for its PE. So
we add support for mmapping MSI-X table in vfio-pci driver if EEH is
supported.

With this patch set applied, we can get almost 100% improvement on
performance for mmio accesses when we passthrough sub-page BARs in
our test.

The last two patches in the patch set can be used by qemu to:
- Add support for a VFIO-PCI ioctl to indicate that platform
  support all PCI BARs are page aligned.
- Add support for a VFIO-PCI ioctl to indicate that platform
  support mmapping MSI-X table.

Yongji Xie (3):
  powerpc/pci: Enforce all MMIO BARs to be page aligned
  vfio-pci: Allow to mmap sub-page MMIO BARs if all MMIO BARs are page aligned
  vfio-pci: Allow to mmap MSI-X table if EEH is supported

 arch/powerpc/kernel/pci-common.c|   10 +-
 drivers/vfio/pci/vfio_pci.c |   15 +--
 drivers/vfio/pci/vfio_pci_private.h |   10 ++
 include/uapi/linux/vfio.h   |4 
 4 files changed, 36 insertions(+), 3 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-12-11 Thread Paolo Bonzini


On 11/12/2015 08:52, Ingo Molnar wrote:
> 
> * Paolo Bonzini  wrote:
> 
>>
>>
>> On 10/12/2015 00:12, Andy Lutomirski wrote:
>>> From: Andy Lutomirski 
>>>
>>> The pvclock vdso code was too abstracted to understand easily and
>>> excessively paranoid.  Simplify it for a huge speedup.
>>>
>>> This opens the door for additional simplifications, as the vdso no
>>> longer accesses the pvti for any vcpu other than vcpu 0.
>>>
>>> Before, vclock_gettime using kvm-clock took about 45ns on my machine.
>>> With this change, it takes 29ns, which is almost as fast as the pure TSC
>>> implementation.
>>>
>>> Signed-off-by: Andy Lutomirski 
>>> ---
>>>  arch/x86/entry/vdso/vclock_gettime.c | 81 
>>> 
>>>  1 file changed, 46 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
>>> b/arch/x86/entry/vdso/vclock_gettime.c
>>> index ca94fa649251..c325ba1bdddf 100644
>>> --- a/arch/x86/entry/vdso/vclock_gettime.c
>>> +++ b/arch/x86/entry/vdso/vclock_gettime.c
>>> @@ -78,47 +78,58 @@ static notrace const struct pvclock_vsyscall_time_info 
>>> *get_pvti(int cpu)
>>>  
>>>  static notrace cycle_t vread_pvclock(int *mode)
>>>  {
>>> -   const struct pvclock_vsyscall_time_info *pvti;
>>> +   const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;
>>> cycle_t ret;
>>> -   u64 last;
>>> -   u32 version;
>>> -   u8 flags;
>>> -   unsigned cpu, cpu1;
>>> -
>>> +   u64 tsc, pvti_tsc;
>>> +   u64 last, delta, pvti_system_time;
>>> +   u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
>>>  
>>> /*
>>> -* Note: hypervisor must guarantee that:
>>> -* 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
>>> -* 2. that per-CPU pvclock time info is updated if the
>>> -*underlying CPU changes.
>>> -* 3. that version is increased whenever underlying CPU
>>> -*changes.
>>> +* Note: The kernel and hypervisor must guarantee that cpu ID
>>> +* number maps 1:1 to per-CPU pvclock time info.
>>> +*
>>> +* Because the hypervisor is entirely unaware of guest userspace
>>> +* preemption, it cannot guarantee that per-CPU pvclock time
>>> +* info is updated if the underlying CPU changes or that that
>>> +* version is increased whenever underlying CPU changes.
>>>  *
>>> +* On KVM, we are guaranteed that pvti updates for any vCPU are
>>> +* atomic as seen by *all* vCPUs.  This is an even stronger
>>> +* guarantee than we get with a normal seqlock.
>>> +*
>>> +* On Xen, we don't appear to have that guarantee, but Xen still
>>> +* supplies a valid seqlock using the version field.
>>> +
>>> +* We only do pvclock vdso timing at all if
>>> +* PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
>>> +* mean that all vCPUs have matching pvti and that the TSC is
>>> +* synced, so we can just look at vCPU 0's pvti.
>>>  */
>>> -   do {
>>> -   cpu = __getcpu() & VGETCPU_CPU_MASK;
>>> -   /* TODO: We can put vcpu id into higher bits of pvti.version.
>>> -* This will save a couple of cycles by getting rid of
>>> -* __getcpu() calls (Gleb).
>>> -*/
>>> -
>>> -   pvti = get_pvti(cpu);
>>> -
>>> -   version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
>>> -
>>> -   /*
>>> -* Test we're still on the cpu as well as the version.
>>> -* We could have been migrated just after the first
>>> -* vgetcpu but before fetching the version, so we
>>> -* wouldn't notice a version change.
>>> -*/
>>> -   cpu1 = __getcpu() & VGETCPU_CPU_MASK;
>>> -   } while (unlikely(cpu != cpu1 ||
>>> - (pvti->pvti.version & 1) ||
>>> - pvti->pvti.version != version));
>>> -
>>> -   if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
>>> +
>>> +   if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>>> *mode = VCLOCK_NONE;
>>> +   return 0;
>>> +   }
>>> +
>>> +   do {
>>> +   version = pvti->version;
>>> +
>>> +   /* This is also a read barrier, so we'll read version first. */
>>> +   tsc = rdtsc_ordered();
>>> +
>>> +   pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
>>> +   pvti_tsc_shift = pvti->tsc_shift;
>>> +   pvti_system_time = pvti->system_time;
>>> +   pvti_tsc = pvti->tsc_timestamp;
>>> +
>>> +   /* Make sure that the version double-check is last. */
>>> +   smp_rmb();
>>> +   } while (unlikely((version & 1) || version != pvti->version));
>>> +
>>> +   delta = tsc - pvti_tsc;
>>> +   ret = pvti_system_time +
>>> +   pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
>>> +   pvti_tsc_shift);
>>>  
>>> /* refer to tsc.c read_tsc() comment for rationale */
>>> last = gtod->cycle_last;
>>>
>>
>> Reviewed-by: Paolo Bonzini 
> 
> Thanks. I've added your Reviewed

[PATCH] x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()

2015-12-11 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index f80d70009ff8..6d7d0e52ed5a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -19,7 +19,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #ifdef CONFIG_X86_32
>  #include 
>  #include 

So this change triggered a build failure on 64-bit allmodconfig - fixed via the 
patch below. Your change unearthed a latent bug, a missing header inclusion.

Thanks,

Ingo

>
>From d51953b0873358d13b189996e6976dfa12a9b59d Mon Sep 17 00:00:00 2001
From: Ingo Molnar 
Date: Fri, 11 Dec 2015 09:01:30 +0100
Subject: [PATCH] x86/platform/uv: Include clocksource.h for 
clocksource_touch_watchdog()

This build failure triggers on 64-bit allmodconfig:

  arch/x86/platform/uv/uv_nmi.c:493:2: error: implicit declaration of function 
‘clocksource_touch_watchdog’ [-Werror=implicit-function-declaration]

which is caused by recent changes exposing a missing clocksource.h include
in uv_nmi.c:

  cc1e24fdb064 x86/vdso: Remove pvclock fixmap machinery

this file got clocksource.h indirectly via fixmap.h - that stealth route
of header inclusion is now gone.

Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Thomas Gleixner 
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/uv/uv_nmi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 327f21c3bde1..8dd80050d705 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html