Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-31 Thread Jan Kiszka
On 2012-10-29 09:31, Xiao Guangrong wrote:
 On 10/29/2012 03:44 PM, Jan Kiszka wrote:
 On 2012-10-29 08:09, Xiao Guangrong wrote:
 Jan,

 On 10/26/2012 06:35 PM, Jan Kiszka wrote:

 This has two problems: We know it breaks at least Win 95 that overwrites
 its F-segment during boot. And it applies changes to the shadowed area
 (below 1 MB) also to the ROM area - I don't think that is the original
 behaviour on real hardware.

 So what is the problem? It can break Win95's running?

 I tried to install win95 guest but it failed to boot regardless my patchset
 was applied or not. I found the information that win 95 is not supported at
 http://www.linux-kvm.org/page/Guest_Support_Status

 Note: before my patchset, Win 95 still can happily something into ROM area
 because readonly memory is actually writable on KVM. And win95 can not run
 on isapc with --no-kvm since it is no way to enable shadow ROM.

 Your patches causes regressions on TCG mode as that is perfectly fine
 with booting Win95 so far.
 
 Aha, i tried accel=tcg, before my patchset, it works for -machine pc but
 failed for -machine isapc (known issue for seabios). After my patchset,
 it works fine for both -machine pc and isapc. :)

Indeed, looks like I'm on the wrong track regarding what breaks Win95 in
KVM mode.

However: This patch inappropriately allows the guest to change the BIOS
content during runtime. And that not only in the lower ISA range, not
only for our stepchild isapc but for the high ROM range as well, even
with PCI chipset enabled. So this is nothing more than a hack.

 



 What we need is paravirtual shadow write control for the ISA PC. It's on
 my todo list, maybe I will be able to look into this during the next week.


 You idea is that modify the code of seabios and use a special way (PV) to
 notify Qemu to make the bios writable?

 Yes.


 Actually, I am confused why the guest (including bios) persistently uses
 shadow ROM even if it is not supported (on ISA PC), i think the right way
 is move itself to RAM under this case, no?

 I've been told that Seabios has been built around that assumption and
 the PV shadow control would be simpler to realize.
 
 Sounds the PV is complexer that directly making the bios area writable
 (if it works).

But it is the only correct solution. In fact, shadowing means mapping
RAM above the ROM, not enabling writability, and then copying necessary
bits from the high ROM part to that RAM. Seabios does this when PAM is
available, we just need to pull in those bits for PV shadow control.

 


 BTW, your patch series should allow to drop the KVM special case from
 pc_system_firmware_init. That version, btw, treats high and low BIOS
 areas separately - but only reloads the upper area. Hmm...


 You mean that also allow Qemu to use pflash to load bios if kvm is enabled?

 Yes.

 We can not do that for pflash is a RD device which can not be directly 
 written,
 kvm can not emulate the instruction which implicitly write the memory. (e.g:
 using this area as stack).

 Isn't enabling ROMD support for KVM that whole point of your patches? I
 
 It can generate MMIO exit if ROMD be written, that means the instruction
 needs kvm's help to be finished if it explicitly/implicitly write the memory.

I was assuming that this is what you already do. If you trap write
accesses, why not allowing user space to handle them?

 
 do not see yet what prevents this still, but it should be fixed first.
 
 For the explicitly write memory access, it is easy to be fixed - we just need
 to fetch the instruction from EIP and emulate it. But for the implicitly 
 memory
 access, fixing its emulation is really hard work. Really worth doing it?

Aren't the read-only regions also marked read-only on the host side to
avoid that the guest writes to it? Or how is this implemented?

Support for flash emulation in KVM mode is increasingly important, for
embedded platform virtualization but also for classic x86 server-like
targets. The pflash-backed system firmware device was added for a reason...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly

2012-10-31 Thread Raghavendra K T

On 10/30/2012 01:44 PM, Peter Zijlstra wrote:

On Tue, 2012-10-30 at 11:27 +0530, Raghavendra K T wrote:

Okay, now IIUC, usage of *any* global measure is bad?


Yep, people like to carve up their machines, esp. now that they're
somewhat bigger than they used to be. This can result in very asymmetric
loads, no global measure can ever deal with that.


Thanks for explaining the concerns. Very True and if load is very 
asymmetric due to power optimization etc constraints. This may affect.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-31 Thread Xiao Guangrong
On 10/31/2012 02:03 PM, Jan Kiszka wrote:
 On 2012-10-29 09:31, Xiao Guangrong wrote:
 On 10/29/2012 03:44 PM, Jan Kiszka wrote:
 On 2012-10-29 08:09, Xiao Guangrong wrote:
 Jan,

 On 10/26/2012 06:35 PM, Jan Kiszka wrote:

 This has two problems: We know it breaks at least Win 95 that overwrites
 its F-segment during boot. And it applies changes to the shadowed area
 (below 1 MB) also to the ROM area - I don't think that is the original
 behaviour on real hardware.

 So what is the problem? It can break Win95's running?

 I tried to install win95 guest but it failed to boot regardless my patchset
 was applied or not. I found the information that win 95 is not supported at
 http://www.linux-kvm.org/page/Guest_Support_Status

 Note: before my patchset, Win 95 still can happily something into ROM area
 because readonly memory is actually writable on KVM. And win95 can not run
 on isapc with --no-kvm since it is no way to enable shadow ROM.

 Your patches causes regressions on TCG mode as that is perfectly fine
 with booting Win95 so far.

 Aha, i tried accel=tcg, before my patchset, it works for -machine pc but
 failed for -machine isapc (known issue for seabios). After my patchset,
 it works fine for both -machine pc and isapc. :)
 
 Indeed, looks like I'm on the wrong track regarding what breaks Win95 in
 KVM mode.
 
 However: This patch inappropriately allows the guest to change the BIOS
 content during runtime. And that not only in the lower ISA range, not
 only for our stepchild isapc but for the high ROM range as well, even
 with PCI chipset enabled. So this is nothing more than a hack.

Okay.

 




 What we need is paravirtual shadow write control for the ISA PC. It's on
 my todo list, maybe I will be able to look into this during the next week.


 You idea is that modify the code of seabios and use a special way (PV) to
 notify Qemu to make the bios writable?

 Yes.


 Actually, I am confused why the guest (including bios) persistently uses
 shadow ROM even if it is not supported (on ISA PC), i think the right way
 is move itself to RAM under this case, no?

 I've been told that Seabios has been built around that assumption and
 the PV shadow control would be simpler to realize.

 Sounds the PV is complexer that directly making the bios area writable
 (if it works).
 
 But it is the only correct solution. In fact, shadowing means mapping
 RAM above the ROM, not enabling writability, and then copying necessary
 bits from the high ROM part to that RAM. Seabios does this when PAM is
 available, we just need to pull in those bits for PV shadow control.

Okay. I will continue my work after your PV works. :)

 



 BTW, your patch series should allow to drop the KVM special case from
 pc_system_firmware_init. That version, btw, treats high and low BIOS
 areas separately - but only reloads the upper area. Hmm...


 You mean that also allow Qemu to use pflash to load bios if kvm is enabled?

 Yes.

 We can not do that for pflash is a RD device which can not be directly 
 written,
 kvm can not emulate the instruction which implicitly write the memory. 
 (e.g:
 using this area as stack).

 Isn't enabling ROMD support for KVM that whole point of your patches? I

 It can generate MMIO exit if ROMD be written, that means the instruction
 needs kvm's help to be finished if it explicitly/implicitly write the memory.
 
 I was assuming that this is what you already do. If you trap write
 accesses, why not allowing user space to handle them?

Already have done that, guest write ROMD - vmexit - return to userspace with
MMIO Exit.

 

 do not see yet what prevents this still, but it should be fixed first.

 For the explicitly write memory access, it is easy to be fixed - we just need
 to fetch the instruction from EIP and emulate it. But for the implicitly 
 memory
 access, fixing its emulation is really hard work. Really worth doing it?
 
 Aren't the read-only regions also marked read-only on the host side to
 avoid that the guest writes to it? Or how is this implemented?
 
 Support for flash emulation in KVM mode is increasingly important, for
 embedded platform virtualization but also for classic x86 server-like
 targets. The pflash-backed system firmware device was added for a reason...

Please allow me to clarify it more clearly.

The flash is ROMD device means guest can not write it, any kinds of guest write
access on this device can cause vmexit to kvm and return to userspace.

We should pay more attention on it if we execute the code in ROMD since we
can not use ROMD as stack/page table/IDT table and all other implicitly write 
access.
Of course, if you do not use ROM as those purposes, it is okay. :)






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios

2012-10-31 Thread Raghavendra K T

On 10/30/2012 05:47 PM, Andrew Theurer wrote:

On Mon, 2012-10-29 at 19:36 +0530, Raghavendra K T wrote:

In some special scenarios like #vcpu = #pcpu, PLE handler may
prove very costly, because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

  Similarly, when we have large number of small guests, it is
possible that a spinning vcpu fails to yield_to any vcpu of same
VM and go back and spin. This is also not effective when we are
over-committed. Instead, we do a yield() so that we give chance
to other VMs to run.

This patch tries to optimize above scenarios.

  The first patch optimizes all the yield_to by bailing out when there
  is no need to continue yield_to (i.e., when there is only one task
  in source and target rq).

  Second patch uses that in PLE handler.

  Third patch uses overall system load knowledge to take decison on
  continuing in yield_to handler, and also yielding in overcommits.
  To be precise,
  * loadavg is converted to a scale of 2048  / per CPU
  * a load value of less than 1024 is considered as undercommit and we
  return from PLE handler in those cases
  * a load value of greater than 3586 (1.75 * 2048) is considered as overcommit
   and  we yield to other VMs in such cases.

(let threshold = 2048)
Rationale for using threshold/2 for undercommit limit:
  Having a load below (0.5 * threshold) is used to avoid (the concern rasied by 
Rik)
scenarios where we still have lock holder preempted vcpu waiting to be
scheduled. (scenario arises when rq length is  1 even when we are under
committed)

Rationale for using (1.75 * threshold) for overcommit scenario:
This is a heuristic where we should probably see rq length  1
and a vcpu of a different VM is waiting to be scheduled.

  Related future work (independent of this series):

  - Dynamically changing PLE window depending on system load.

  Result on 3.7.0-rc1 kernel shows around 146% improvement for ebizzy 1x
  with 32 core PLE machine with 32 vcpu guest.
  I believe we should get very good improvements for overcommit (especially  2)
  on large machines with small vcpu guests. (Could not test this as I do not 
have
  access to a bigger machine)

base = 3.7.0-rc1
machine: 32 core mx3850 x5 PLE mc

--+---+---+---++---+
ebizzy (rec/sec higher is beter)
--+---+---+---++---+
 basestdev   patched stdev   %improve
--+---+---+---++---+
1x  2543.375020.29036279.375082.5226   146.89143
2x  2410.875096.43272450.7500   207.8136 1.65396
3x  2184.9167   205.52262178.97.2034-0.30131
--+---+---+---++---+

--+---+---+---++---+
 dbench (throughput in MB/sec. higher is better)
--+---+---+---++---+
 basestdev   patched stdev   %improve
--+---+---+---++---+
1x  5545.4330   596.43447042.8510  1012.092427.00272
2x  1993.097043.65481990.620075.7837-0.12428
3x  1295.386722.39971315.520836.0075 1.55429
--+---+---+---++---+


Could you include a PLE-off result for 1x over-commit, so we know what
the best possible result is?


Yes,

base no PLE

ebizzy_1x 7651.3000 rec/sec
ebizzy_2x   51.5000 rec/sec

ebizzy we are closer.

dbench_1x 12631.4210 MB/sec
dbench_2x 45.0842MB/sec

(strangely dbench 1x result is not consistent sometime despite 10 runs
of 3min + 30 sec warmup runs on a 3G tmpfs. But surely it tells the trend)



Looks like skipping the yield_to() for rq = 1 helps, but I'd like to
know if the performance is the same as PLE off for 1x.  I am concerned
the vcpu to task lookup is still expensive.



Yes. I still see that.


Based on Peter's comments I would say the 3rd patch and the 2x,3x
results are not conclusive at this time.


Avi, IMO patch 1 and 2 seem to be good to go. Please let me know.



I think we should also discuss what we think a good target is.  We
should know what our high-water mark is, and IMO, if we cannot get
close, then I do not feel we are heading down the right path.  For
example, if dbench aggregate throughput for 1x with PLE off is 1
MB/sec, then the best possible 2x,3x result, should be a little lower
than that due to task switching the vcpus and sharing chaches.  This
should be quite evident with current PLE handler and smaller VMs (like
10 vcpus or less).


Very much agree here. If we see the 2x 3x results (all/any of them).
aggregate is not near 1x. May be even 70% is a good target.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-31 Thread Jan Kiszka
On 2012-10-31 07:35, Xiao Guangrong wrote:
 We can not do that for pflash is a RD device which can not be directly 
 written,
 kvm can not emulate the instruction which implicitly write the memory. 
 (e.g:
 using this area as stack).

 Isn't enabling ROMD support for KVM that whole point of your patches? I

 It can generate MMIO exit if ROMD be written, that means the instruction
 needs kvm's help to be finished if it explicitly/implicitly write the 
 memory.

 I was assuming that this is what you already do. If you trap write
 accesses, why not allowing user space to handle them?
 
 Already have done that, guest write ROMD - vmexit - return to userspace with
 MMIO Exit.

Great.

 


 do not see yet what prevents this still, but it should be fixed first.

 For the explicitly write memory access, it is easy to be fixed - we just 
 need
 to fetch the instruction from EIP and emulate it. But for the implicitly 
 memory
 access, fixing its emulation is really hard work. Really worth doing it?

 Aren't the read-only regions also marked read-only on the host side to
 avoid that the guest writes to it? Or how is this implemented?

 Support for flash emulation in KVM mode is increasingly important, for
 embedded platform virtualization but also for classic x86 server-like
 targets. The pflash-backed system firmware device was added for a reason...
 
 Please allow me to clarify it more clearly.
 
 The flash is ROMD device means guest can not write it, any kinds of guest 
 write
 access on this device can cause vmexit to kvm and return to userspace.
 
 We should pay more attention on it if we execute the code in ROMD since we
 can not use ROMD as stack/page table/IDT table and all other implicitly write 
 access.
 Of course, if you do not use ROM as those purposes, it is okay. :)

So the problem is that there is KVM code that still blindly writes to
guest memory and does not take the memory regions' protection flag into
account? And we cannot mark those regions read only in the host's page
table?

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-31 Thread Xiao Guangrong
On 10/31/2012 02:46 PM, Jan Kiszka wrote:

 Please allow me to clarify it more clearly.

 The flash is ROMD device means guest can not write it, any kinds of guest 
 write
 access on this device can cause vmexit to kvm and return to userspace.

 We should pay more attention on it if we execute the code in ROMD since we
 can not use ROMD as stack/page table/IDT table and all other implicitly 
 write access.
 Of course, if you do not use ROM as those purposes, it is okay. :)
 
 So the problem is that there is KVM code that still blindly writes to
 guest memory and does not take the memory regions' protection flag into
 account? And we cannot mark those regions read only in the host's page
 table?

KVM has the ability to catch this kind of write access on ROMD, it is just hard 
to
emulate the implicitly memory access.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-31 Thread Jan Kiszka
On 2012-10-31 08:01, Xiao Guangrong wrote:
 On 10/31/2012 02:46 PM, Jan Kiszka wrote:
 
 Please allow me to clarify it more clearly.

 The flash is ROMD device means guest can not write it, any kinds of guest 
 write
 access on this device can cause vmexit to kvm and return to userspace.

 We should pay more attention on it if we execute the code in ROMD since we
 can not use ROMD as stack/page table/IDT table and all other implicitly 
 write access.
 Of course, if you do not use ROM as those purposes, it is okay. :)

 So the problem is that there is KVM code that still blindly writes to
 guest memory and does not take the memory regions' protection flag into
 account? And we cannot mark those regions read only in the host's page
 table?
 
 KVM has the ability to catch this kind of write access on ROMD, it is just 
 hard to
 emulate the implicitly memory access.

Drop them? It is highly unlikely that they trigger the magic
write-enable patterns at the right spot in a ROMD device.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] lkvm crash on crashkernel boot

2012-10-31 Thread Pekka Enberg
On Thu, 25 Oct 2012, Kirill A. Shutemov wrote:
 vec is 0x in virtio_pci__specific_io_out() on crash.
 
 Let's add proper bounds checking there. It doesn't not solves the issue
 with booting crashkernel, but fix lkvm crash.

There's no sign-off for the patch. Kirill, care to update the patch 
description and send it my way?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] acpi_piix4: fix migration of gpe fields

2012-10-31 Thread Paolo Bonzini
Il 31/10/2012 00:14, Marcelo Tosatti ha scritto:
 
 Migrate 16 bytes for en/sts fields (which is the correct size), 
 increase version to 3, and document how to support incoming
 migration from qemu-kvm 1.2.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
 index 15275cf..519269a 100644
 --- a/hw/acpi_piix4.c
 +++ b/hw/acpi_piix4.c
 @@ -235,10 +235,9 @@ static int vmstate_acpi_post_load(void *opaque, int 
 version_id)
   {   \
   .name   = (stringify(_field)),  \
   .version_id = 0,\
 - .num= GPE_LEN,  \
   .info   = vmstate_info_uint16, \
   .size   = sizeof(uint16_t), \
 - .flags  = VMS_ARRAY | VMS_POINTER,  \
 + .flags  = VMS_SINGLE | VMS_POINTER, \
   .offset = vmstate_offset_pointer(_state, _field, uint8_t),  \
   }
  
 @@ -267,11 +266,54 @@ static const VMStateDescription vmstate_pci_status = {
  }
  };
  
 +static int acpi_load_old(QEMUFile *f, void *opaque, int version_id)
 +{
 +PIIX4PMState *s = opaque;
 +int ret, i;
 +uint16_t temp;
 +
 +ret = pci_device_load(s-dev, f);
 +if (ret  0) {
 +return ret;
 +}
 +qemu_get_be16s(f, s-ar.pm1.evt.sts);
 +qemu_get_be16s(f, s-ar.pm1.evt.en);
 +qemu_get_be16s(f, s-ar.pm1.cnt.cnt);
 +
 +ret = vmstate_load_state(f, vmstate_apm, opaque, 1);
 +if (ret) {
 +return ret;
 +}
 +
 +qemu_get_timer(f, s-ar.tmr.timer);
 +qemu_get_sbe64s(f, s-ar.tmr.overflow_time);
 +
 +qemu_get_be16s(f, (uint16_t *)s-ar.gpe.sts);
 +for (i = 0; i  3; i++) {
 +qemu_get_be16s(f, temp);
 +}
 +
 +qemu_get_be16s(f, (uint16_t *)s-ar.gpe.en);
 +for (i = 0; i  3; i++) {
 +qemu_get_be16s(f, temp);
 +}
 +
 +ret = vmstate_load_state(f, vmstate_pci_status, opaque, 1);
 +return ret;
 +}
 +
 +/* qemu-kvm 1.2 uses version 3 but advertised as 2
 + * To support incoming qemu-kvm 1.2 migration, change version_id
 + * and minimum_version_id to 2 below (which breaks migration from
 + * qemu 1.2).
 + *
 + */
  static const VMStateDescription vmstate_acpi = {
  .name = piix4_pm,
 -.version_id = 2,
 -.minimum_version_id = 1,
 +.version_id = 3,
 +.minimum_version_id = 3,
  .minimum_version_id_old = 1,
 +.load_state_old = acpi_load_old,
  .post_load = vmstate_acpi_post_load,
  .fields  = (VMStateField []) {
  VMSTATE_PCI_DEVICE(dev, PIIX4PMState),
 

ACK

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when loading kvm_intel module

2012-10-31 Thread Hatayama, Daisuke


 -Original Message-
 From: kexec-boun...@lists.infradead.org
 [mailto:kexec-boun...@lists.infradead.org] On Behalf Of zhangyanfei
 Sent: Wednesday, October 31, 2012 12:34 PM
 To: x...@kernel.org; ke...@lists.infradead.org; Avi Kivity; Marcelo
 Tosatti
 Cc: linux-ker...@vger.kernel.org; kvm@vger.kernel.org
 Subject: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when
 loading kvm_intel module
 
 Signed-off-by: Zhang Yanfei zhangyan...@cn.fujitsu.com

[...]

 @@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
   if (r)
   goto out3;
 
 +#ifdef CONFIG_KEXEC
 + crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
 +#endif
 +

Assignment here cannot cover the case where NMI is initiated after VMX is on in 
kvm_init and before vmclear_local_loaded_vmcss is assigned, though rare but can 
happen.

What does happen if calling vmclear_local_loaded_vmcss before kvm_init? I think 
it no problem since the list is initially empty.

   vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
   vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
   vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
 @@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
   free_page((unsigned long)vmx_io_bitmap_b);
   free_page((unsigned long)vmx_io_bitmap_a);
 
 +#ifdef CONFIG_KEXEC
 + crash_clear_loaded_vmcss = NULL;
 +#endif
 +
   kvm_exit();
  }

Also, this is converse to the above.

Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/28] target-i386: cpu: make -cpu host/check/enforce code KVM-specific

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

Rationale:
 * -cpu host is available only when using KVM
 * The current implementation of -cpu check/enforce
   (check_features_against_host()) makes sense only when using KVM.

So this makes the functions check_features_against_host() and
cpu_x86_fill_host() KVM-specific, document them as such, and rename them
to kvm_check_features_against_host() and kvm_cpu_fill_host().

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/cpu.c |   25 ++---
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 6575834..390ed47 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -766,10 +766,17 @@ static int cpu_x86_fill_model_id(char *str)
 return 0;
 }
 
-static void cpu_x86_fill_host(x86_def_t *x86_cpu_def)
+/* Fill a x86_def_t struct with information about the host CPU, and
+ * the CPU features supported by the host hardware + host kernel
+ *
+ * This function may be called only if KVM is enabled.
+ */
+static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
 {
 uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
 
+assert(kvm_enabled());
+
 x86_cpu_def-name = host;
 host_cpuid(0x0, 0, eax, ebx, ecx, edx);
 x86_cpu_def-level = eax;
@@ -784,7 +791,7 @@ static void cpu_x86_fill_host(x86_def_t *x86_cpu_def)
 x86_cpu_def-ext_features = ecx;
 x86_cpu_def-features = edx;
 
-if (kvm_enabled()  x86_cpu_def-level = 7) {
+if (x86_cpu_def-level = 7) {
 x86_cpu_def-cpuid_7_0_ebx_features = 
kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
 } else {
 x86_cpu_def-cpuid_7_0_ebx_features = 0;
@@ -839,8 +846,10 @@ static int unavailable_host_feature(struct 
model_features_t *f, uint32_t mask)
 /* best effort attempt to inform user requested cpu flags aren't making
  * their way to the guest.  Note: ft[].check_feat ideally should be
  * specified via a guest_def field to suppress report of extraneous flags.
+ *
+ * This function may be called only if KVM is enabled.
  */
-static int check_features_against_host(x86_def_t *guest_def)
+static int kvm_check_features_against_host(x86_def_t *guest_def)
 {
 x86_def_t host_def;
 uint32_t mask;
@@ -855,7 +864,9 @@ static int check_features_against_host(x86_def_t *guest_def)
 {guest_def-ext3_features, host_def.ext3_features,
 ~CPUID_EXT3_SVM, ext3_feature_name, 0x8001}};
 
-cpu_x86_fill_host(host_def);
+assert(kvm_enabled());
+
+kvm_cpu_fill_host(host_def);
 for (rv = 0, i = 0; i  ARRAY_SIZE(ft); ++i)
 for (mask = 1; mask; mask = 1)
 if (ft[i].check_feat  mask  *ft[i].guest_feat  mask 
@@ -1142,7 +1153,7 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 if (name  !strcmp(name, def-name))
 break;
 if (kvm_enabled()  name  strcmp(name, host) == 0) {
-cpu_x86_fill_host(x86_cpu_def);
+kvm_cpu_fill_host(x86_cpu_def);
 } else if (!def) {
 goto error;
 } else {
@@ -1280,8 +1291,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 x86_cpu_def-kvm_features = ~minus_kvm_features;
 x86_cpu_def-svm_features = ~minus_svm_features;
 x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features;
-if (check_cpuid) {
-if (check_features_against_host(x86_cpu_def)  enforce_cpuid)
+if (check_cpuid  kvm_enabled()) {
+if (kvm_check_features_against_host(x86_cpu_def)  enforce_cpuid)
 goto error;
 }
 if (x86_cpu_def-cpuid_7_0_ebx_features  x86_cpu_def-level  7) {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/28] i386: kvm: mask cpuid_ext4_features bits earlier

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

This way all the filtering by GET_SUPPORTED_CPUID is being done at the
same place in the code.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8eb61a0..be37a1f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -431,6 +431,9 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 env-cpuid_kvm_features =
 kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
 
+env-cpuid_ext4_features = kvm_arch_get_supported_cpuid(s, 0xC001,
+ 0, R_EDX);
+
 cpuid_i = 0;
 
 /* Paravirtualization CPUIDs */
@@ -572,8 +575,6 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 /* Call Centaur's CPUID instructions they are supported. */
 if (env-cpuid_xlevel2  0) {
-env-cpuid_ext4_features =
-kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
 cpu_x86_cpuid(env, 0xC000, 0, limit, unused, unused, unused);
 
 for (i = 0xC000; i = limit; i++) {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/28] Use global properties to emulate -no-kvm-pit-reinjection

2012-10-31 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.

Port -no-kvm-pit-reinjection.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 qemu-options.hx |4 
 vl.c|   15 +++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 42e642c..4403c9c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2888,6 +2888,10 @@ STEXI
 Enable FIPS 140-2 compliance mode.
 ETEXI
 
+HXCOMM Deprecated by kvm-pit driver properties
+DEF(no-kvm-pit-reinjection, HAS_ARG, QEMU_OPTION_no_kvm_pit_reinjection,
+, QEMU_ARCH_I386)
+
 HXCOMM Deprecated (ignored)
 DEF(no-kvm-pit, HAS_ARG, QEMU_OPTION_no_kvm_pit, , QEMU_ARCH_I386)
 
diff --git a/vl.c b/vl.c
index 37717b4..3068b4b 100644
--- a/vl.c
+++ b/vl.c
@@ -3176,6 +3176,21 @@ int main(int argc, char **argv, char **envp)
 separately.\n);
 break;
 }
+case QEMU_OPTION_no_kvm_pit_reinjection: {
+static GlobalProperty kvm_pit_lost_tick_policy[] = {
+{
+.driver   = kvm-pit,
+.property = lost_tick_policy,
+.value= discard,
+},
+{ /* end of list */ }
+};
+
+fprintf(stderr, Warning: option deprecated, use 
+lost_tick_policy property of kvm-pit instead.\n);
+qdev_prop_register_global_list(kvm_pit_lost_tick_policy);
+break;
+}
 case QEMU_OPTION_usb:
 machine_opts = qemu_opts_find(qemu_find_opts(machine), 0);
 if (machine_opts) {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/28] Emulate qemu-kvms -no-kvm option

2012-10-31 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.

Port -no-kvm option.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 qemu-options.hx |3 +++
 vl.c|4 
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 628bd44..fe8f15c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2888,6 +2888,9 @@ STEXI
 Enable FIPS 140-2 compliance mode.
 ETEXI
 
+HXCOMM Deprecated by -machine accel=tcg property
+DEF(no-kvm, HAS_ARG, QEMU_OPTION_no_kvm, , QEMU_ARCH_I386)
+
 HXCOMM Deprecated by kvm-pit driver properties
 DEF(no-kvm-pit-reinjection, HAS_ARG, QEMU_OPTION_no_kvm_pit_reinjection,
 , QEMU_ARCH_I386)
diff --git a/vl.c b/vl.c
index cd7c0fb..b39f22e 100644
--- a/vl.c
+++ b/vl.c
@@ -3171,6 +3171,10 @@ int main(int argc, char **argv, char **envp)
 machine = machine_parse(optarg);
 }
 break;
+ case QEMU_OPTION_no_kvm:
+olist = qemu_find_opts(machine);
+qemu_opts_parse(olist, accel=tcg, 0);
+break;
 case QEMU_OPTION_no_kvm_pit: {
 fprintf(stderr, Warning: KVM PIT can no longer be disabled 
 separately.\n);
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/28] i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of for loop

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

The for loop will become a separate function, so clean it up so it can
become independent from the bit hacking for R_EDX.

No behavior change[1], just code movement.

[1] Well, only if the kernel returned CPUID leafs 1 or 0x8001 as
unsupported, but there's no kernel version that does that.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |   31 ++-
 1 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3aa62b2..b7490f9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -155,24 +155,29 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 break;
 case R_EDX:
 ret = cpuid-entries[i].edx;
-switch (function) {
-case 1:
-/* KVM before 2.6.30 misreports the following features */
-ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
-break;
-case 0x8001:
-/* On Intel, kvm returns cpuid according to the Intel spec,
- * so add missing bits according to the AMD spec:
- */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
-break;
-}
 break;
 }
 }
 }
 
+/* Fixups for the data returned by KVM, below */
+
+if (reg == R_EDX) {
+switch (function) {
+case 1:
+/* KVM before 2.6.30 misreports the following features */
+ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+break;
+case 0x8001:
+/* On Intel, kvm returns cpuid according to the Intel spec,
+ * so add missing bits according to the AMD spec:
+ */
+cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
+break;
+}
+}
+
 g_free(cpuid);
 
 /* fallback for older kernels */
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/28] Use machine options to emulate -no-kvm-irqchip

2012-10-31 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.

Port -no-kvm-irqchip option.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 qemu-options.hx |3 +++
 vl.c|5 +
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index a67a255..d813755 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2888,6 +2888,9 @@ STEXI
 Enable FIPS 140-2 compliance mode.
 ETEXI
 
+HXCOMM Deprecated by -machine kernel_irqchip=on|off property
+DEF(no-kvm-irqchip, HAS_ARG, QEMU_OPTION_no_kvm_irqchip, , QEMU_ARCH_I386)
+
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
diff --git a/vl.c b/vl.c
index 5a3d316..7d0708a 100644
--- a/vl.c
+++ b/vl.c
@@ -2574,6 +2574,11 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_M:
 machine = machine_parse(optarg);
 break;
+case QEMU_OPTION_no_kvm_irqchip: {
+olist = qemu_find_opts(machine);
+qemu_opts_parse(olist, kernel_irqchip=off, 0);
+break;
+}
 case QEMU_OPTION_cpu:
 /* hw initialization will check this */
 cpu_model = optarg;
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/28] i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on kvm_arch_get_supported_cpuid()

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

This moves the CPUID_EXT_TSC_DEADLINE_TIMER CPUID flag hacking from
kvm_arch_init_vcpu() to kvm_arch_get_supported_cpuid().

Full git grep for kvm_arch_get_supported_cpuid:

   kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
   target-i386/cpu.c:x86_cpu_def-cpuid_7_0_ebx_features = 
kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EDX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EDX);
   target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   target-i386/kvm.c:cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 
0, R_EDX);
   target-i386/kvm.c:env-cpuid_features = kvm_arch_get_supported_cpuid(s, 
1, 0, R_EDX);
 * target-i386/kvm.c:env-cpuid_ext_features = 
kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
   target-i386/kvm.c:env-cpuid_ext2_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_ext3_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_svm_features  = 
kvm_arch_get_supported_cpuid(s, 0x800A,
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 
KVM_CPUID_FEATURES, 0, R_EAX);
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 0xC001, 0, 
R_EDX);

Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_TSC_DEADLINE_TIMER, so we
can simply make kvm_arch_get_supported_cpuid() set it, to let the rest
of the code know the flag can be safely set by QEMU.

One thing I was worrying about when doing this is that now
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(). But
the 'kvm_kernel_irqchip' global variable is initialized during
kvm_init(), that is called very early, and kvm_init() is already a
requirement to run the GET_SUPPORTED_CPUID ioctl() (as kvm_init() is the
function that initializes the 'kvm_state' global variable).

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index aabac72..aae8af2 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -201,6 +201,14 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
  * GET_SUPPORTED_CPUID
  */
 ret |= CPUID_EXT_HYPERVISOR;
+/* tsc-deadline flag is not returned by GET_SUPPORTED_CPUID, but it
+ * can be enabled if the kernel has KVM_CAP_TSC_DEADLINE_TIMER,
+ * and the irqchip is in the kernel.
+ */
+if (kvm_irqchip_in_kernel() 
+kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
+ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
+}
 } else if (function == 0x8001  reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
@@ -404,12 +412,7 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 
-j = env-cpuid_ext_features  CPUID_EXT_TSC_DEADLINE_TIMER;
 env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-if (j  kvm_irqchip_in_kernel() 
-kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
-env-cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
-}
 
 env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
  0, R_EDX);
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/28] Issue warning when deprecated -tdf option is used

2012-10-31 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.

Port -tdf option.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 qemu-options.hx |3 +++
 vl.c|4 
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 4403c9c..628bd44 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2898,6 +2898,9 @@ DEF(no-kvm-pit, HAS_ARG, QEMU_OPTION_no_kvm_pit, , 
QEMU_ARCH_I386)
 HXCOMM Deprecated by -machine kernel_irqchip=on|off property
 DEF(no-kvm-irqchip, HAS_ARG, QEMU_OPTION_no_kvm_irqchip, , QEMU_ARCH_I386)
 
+HXCOMM Deprecated (ignored)
+DEF(tdf, 0, QEMU_OPTION_tdf,, QEMU_ARCH_ALL)
+
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
diff --git a/vl.c b/vl.c
index 3068b4b..cd7c0fb 100644
--- a/vl.c
+++ b/vl.c
@@ -3280,6 +3280,10 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_semihosting:
 semihosting_enabled = 1;
 break;
+case QEMU_OPTION_tdf:
+fprintf(stderr, Warning: user space PIT time drift fix 
+is no longer supported.\n);
+break;
 case QEMU_OPTION_name:
 qemu_name = g_strdup(optarg);
 {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 28/28] update-linux-headers.sh: Handle new kernel uapi/ directories

2012-10-31 Thread Marcelo Tosatti
From: Peter Maydell peter.mayd...@linaro.org

Recent kernels have moved to keeping the userspace headers
in uapi/ subdirectories. This breaks the detection of whether an
architecture has KVM support in the kernel because kvm.h has
moved in the kernel source tree. Update the check to support
both the old and new locations.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 scripts/update-linux-headers.sh |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 67be2ef..4c7b566 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -34,7 +34,8 @@ ARCHLIST=$(cd $linux/arch  echo *)
 
 for arch in $ARCHLIST; do
 # Discard anything which isn't a KVM-supporting architecture
-if ! [ -e $linux/arch/$arch/include/asm/kvm.h ]; then
+if ! [ -e $linux/arch/$arch/include/asm/kvm.h ] 
+! [ -e $linux/arch/$arch/include/uapi/asm/kvm.h ] ; then
 continue
 fi
 
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/28] i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

Full grep for kvm_arch_get_supported_cpuid:

   kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
   target-i386/cpu.c:x86_cpu_def-cpuid_7_0_ebx_features = 
kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EDX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EDX);
   target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   target-i386/kvm.c:cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 
0, R_EDX);
   target-i386/kvm.c:env-cpuid_features = kvm_arch_get_supported_cpuid(s, 
1, 0, R_EDX);
 * target-i386/kvm.c:env-cpuid_ext_features = 
kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
   target-i386/kvm.c:env-cpuid_ext2_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_ext3_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_svm_features  = 
kvm_arch_get_supported_cpuid(s, 0x800A,
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 
KVM_CPUID_FEATURES, 0, R_EAX);
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 0xC001, 0, 
R_EDX);

Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_HYPERVISOR, so we can
simply make kvm_arch_get_supported_cpuid() set it, to let the rest of
the code automatically know that the flag can be safely set by QEMU.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1f943c5..aabac72 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -196,6 +196,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 if (function == 1  reg == R_EDX) {
 /* KVM before 2.6.30 misreports the following features */
 ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+} else if (function == 1  reg == R_ECX) {
+/* We can set the hypervisor flag, even if KVM does not return it on
+ * GET_SUPPORTED_CPUID
+ */
+ret |= CPUID_EXT_HYPERVISOR;
 } else if (function == 0x8001  reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
@@ -399,10 +404,8 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 
-i = env-cpuid_ext_features  CPUID_EXT_HYPERVISOR;
 j = env-cpuid_ext_features  CPUID_EXT_TSC_DEADLINE_TIMER;
 env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-env-cpuid_ext_features |= i;
 if (j  kvm_irqchip_in_kernel() 
 kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 env-cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/28] i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

The reg switch will be moved to a separate function, so store the entry
pointer in a variable.

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 56addf1..18782e4 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -140,19 +140,20 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 for (i = 0; i  cpuid-nent; ++i) {
 if (cpuid-entries[i].function == function 
 cpuid-entries[i].index == index) {
+struct kvm_cpuid_entry2 *entry = cpuid-entries[i];
 found = true;
 switch (reg) {
 case R_EAX:
-ret = cpuid-entries[i].eax;
+ret = entry-eax;
 break;
 case R_EBX:
-ret = cpuid-entries[i].ebx;
+ret = entry-ebx;
 break;
 case R_ECX:
-ret = cpuid-entries[i].ecx;
+ret = entry-ecx;
 break;
 case R_EDX:
-ret = cpuid-entries[i].edx;
+ret = entry-edx;
 break;
 }
 }
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/28] cirrus_vga: allow configurable vram size

2012-10-31 Thread Marcelo Tosatti
Allow RAM size to be configurable for cirrus, to allow migration
compatibility from qemu-kvm.

Acked-by: Gerd Hoffmann kra...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 hw/cirrus_vga.c |   21 -
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index e4af2e9..9bef96e 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -42,8 +42,6 @@
 //#define DEBUG_CIRRUS
 //#define DEBUG_BITBLT
 
-#define VGA_RAM_SIZE (8192 * 1024)
-
 /***
  *
  *  definitions
@@ -2856,7 +2854,8 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci,
 
 /* I/O handler for LFB */
 memory_region_init_io(s-cirrus_linear_io, cirrus_linear_io_ops, s,
-  cirrus-linear-io, VGA_RAM_SIZE);
+  cirrus-linear-io, s-vga.vram_size_mb
+  * 1024 * 1024);
 memory_region_set_flush_coalesced(s-cirrus_linear_io);
 
 /* I/O handler for LFB */
@@ -2899,7 +2898,6 @@ static int vga_initfn(ISADevice *dev)
 ISACirrusVGAState *d = DO_UPCAST(ISACirrusVGAState, dev, dev);
 VGACommonState *s = d-cirrus_vga.vga;
 
-s-vram_size_mb = VGA_RAM_SIZE  20;
 vga_common_init(s);
 cirrus_init_common(d-cirrus_vga, CIRRUS_ID_CLGD5430, 0,
isa_address_space(dev));
@@ -2912,6 +2910,12 @@ static int vga_initfn(ISADevice *dev)
 return 0;
 }
 
+static Property isa_vga_cirrus_properties[] = {
+DEFINE_PROP_UINT32(vgamem_mb, struct ISACirrusVGAState,
+   cirrus_vga.vga.vram_size_mb, 8),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void isa_cirrus_vga_class_init(ObjectClass *klass, void *data)
 {
 ISADeviceClass *k = ISA_DEVICE_CLASS(klass);
@@ -2919,6 +2923,7 @@ static void isa_cirrus_vga_class_init(ObjectClass *klass, 
void *data)
 
 dc-vmsd  = vmstate_cirrus_vga;
 k-init   = vga_initfn;
+dc-props = isa_vga_cirrus_properties;
 }
 
 static TypeInfo isa_cirrus_vga_info = {
@@ -2942,7 +2947,6 @@ static int pci_cirrus_vga_initfn(PCIDevice *dev)
  int16_t device_id = pc-device_id;
 
  /* setup VGA */
- s-vga.vram_size_mb = VGA_RAM_SIZE  20;
  vga_common_init(s-vga);
  cirrus_init_common(s, device_id, 1, pci_address_space(dev));
  s-vga.ds = graphic_console_init(s-vga.update, s-vga.invalidate,
@@ -2969,6 +2973,12 @@ static int pci_cirrus_vga_initfn(PCIDevice *dev)
  return 0;
 }
 
+static Property pci_vga_cirrus_properties[] = {
+DEFINE_PROP_UINT32(vgamem_mb, struct PCICirrusVGAState,
+   cirrus_vga.vga.vram_size_mb, 8),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void cirrus_vga_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2982,6 +2992,7 @@ static void cirrus_vga_class_init(ObjectClass *klass, 
void *data)
 k-class_id = PCI_CLASS_DISPLAY_VGA;
 dc-desc = Cirrus CLGD 54xx VGA;
 dc-vmsd = vmstate_pci_cirrus_vga;
+dc-props = pci_vga_cirrus_properties;
 }
 
 static TypeInfo cirrus_vga_info = {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/28] i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features check

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

Instead of a function-specific has_kvm_features variable, simply use a
found variable that will be checked in case we have to use the legacy
get_para_features() interface.

No behavior change, just code cleanup.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index b7490f9..56addf1 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -130,7 +130,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 int i, max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
-int has_kvm_features = 0;
+bool found = false;
 
 max = 1;
 while ((cpuid = try_get_cpuid(s, max)) == NULL) {
@@ -140,9 +140,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 for (i = 0; i  cpuid-nent; ++i) {
 if (cpuid-entries[i].function == function 
 cpuid-entries[i].index == index) {
-if (cpuid-entries[i].function == KVM_CPUID_FEATURES) {
-has_kvm_features = 1;
-}
+found = true;
 switch (reg) {
 case R_EAX:
 ret = cpuid-entries[i].eax;
@@ -181,7 +179,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 g_free(cpuid);
 
 /* fallback for older kernels */
-if (!has_kvm_features  (function == KVM_CPUID_FEATURES)) {
+if ((function == KVM_CPUID_FEATURES)  !found) {
 ret = get_para_features(s);
 }
 
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/28] i386: kvm: filter CPUID feature words earlier, on cpu.c

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

cpu.c contains the code that will check if all requested CPU features
are available, so the filtering of KVM features must be there, so we can
implement check and enforce properly.

The only point where kvm_arch_init_vcpu() is called on i386 is:

- cpu_x86_init()
  - x86_cpu_realize() (after cpu_x86_register() is called)
- qemu_init_vcpu()
  - qemu_kvm_start_vcpu()
- qemu_kvm_thread_fn() (on a new thread)
  - kvm_init_vcpu()
- kvm_arch_init_vcpu()

With this patch, the filtering will be done earlier, at:
- cpu_x86_init()
  - cpu_x86_register() (before x86_cpu_realize() is called)

Also, the KVM CPUID filtering will now be done at the same place where
the TCG CPUID feature filtering is done. Later, the code can be changed
to use the same filtering code for the check and enforce modes, as
now the cpu.c code knows exactly which CPU features are going to be
exposed to the guest (and much earlier).

One thing I was worrying about when doing this is that
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(), and
maybe the 'kvm_kernel_irqchip' global variable wasn't initialized yet at
CPU creation time. But kvm_kernel_irqchip is initialized during
kvm_init(), that is called very early (much earlier than the machine
init function), and kvm_init() is already a requirement to run the
GET_SUPPORTED_CPUID ioctl() (as kvm_init() initializes the kvm_state
global variable).

Side note: it would be nice to keep KVM-specific code inside kvm.c. The
problem is that properly implementing -cpu check/enforce code (that's
inside cpu.c) depends directly on the feature bit filtering done using
kvm_arch_get_supported_cpuid(). Currently -cpu check/enforce is broken
because it simply uses the host CPU feature bits instead of
GET_SUPPORTED_CPUID, and we need to fix that.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 kvm.h |1 +
 target-i386/cpu.c |   30 ++
 target-i386/kvm.c |   18 --
 3 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/kvm.h b/kvm.h
index 2b26dcb..1e7f244 100644
--- a/kvm.h
+++ b/kvm.h
@@ -20,6 +20,7 @@
 
 #ifdef CONFIG_KVM
 #include linux/kvm.h
+#include linux/kvm_para.h
 #endif
 
 extern int kvm_allowed;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index d4f2e65..78876f6 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1368,6 +1368,32 @@ CpuDefinitionInfoList *arch_query_cpu_definitions(Error 
**errp)
 return cpu_list;
 }
 
+#ifdef CONFIG_KVM
+static void filter_features_for_kvm(X86CPU *cpu)
+{
+CPUX86State *env = cpu-env;
+KVMState *s = kvm_state;
+
+env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+
+env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
+
+env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
+ 0, R_EDX);
+env-cpuid_ext3_features = kvm_arch_get_supported_cpuid(s, 0x8001,
+ 0, R_ECX);
+env-cpuid_svm_features  = kvm_arch_get_supported_cpuid(s, 0x800A,
+ 0, R_EDX);
+
+env-cpuid_kvm_features =
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+
+env-cpuid_ext4_features = kvm_arch_get_supported_cpuid(s, 0xC001,
+ 0, R_EDX);
+
+}
+#endif
+
 int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
 {
 CPUX86State *env = cpu-env;
@@ -1425,6 +1451,10 @@ int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
 );
 env-cpuid_ext3_features = TCG_EXT3_FEATURES;
 env-cpuid_svm_features = TCG_SVM_FEATURES;
+} else {
+#ifdef CONFIG_KVM
+filter_features_for_kvm(cpu);
+#endif
 }
 object_property_set_str(OBJECT(cpu), def-model_id, model-id, error);
 if (error_is_set(error)) {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index be37a1f..593e9d3 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -410,30 +410,12 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 struct kvm_cpuid2 cpuid;
 struct kvm_cpuid_entry2 entries[100];
 } QEMU_PACKED cpuid_data;
-KVMState *s = env-kvm_state;
 uint32_t limit, i, j, cpuid_i;
 uint32_t unused;
 struct kvm_cpuid_entry2 *c;
 uint32_t signature[3];
 int r;
 
-env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-
-env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-
-env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_EDX);
-env-cpuid_ext3_features = kvm_arch_get_supported_cpuid(s, 0x8001,
-   

[PATCH 07/28] i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with single 'if'

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

Additional fixups will be added, and making them a single 'if/else if'
chain makes it clearer than two nested switch statements.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |   23 +--
 1 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d74dbc1..1f943c5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -193,20 +193,15 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 
 /* Fixups for the data returned by KVM, below */
 
-if (reg == R_EDX) {
-switch (function) {
-case 1:
-/* KVM before 2.6.30 misreports the following features */
-ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
-break;
-case 0x8001:
-/* On Intel, kvm returns cpuid according to the Intel spec,
- * so add missing bits according to the AMD spec:
- */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
-break;
-}
+if (function == 1  reg == R_EDX) {
+/* KVM before 2.6.30 misreports the following features */
+ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+} else if (function == 0x8001  reg == R_EDX) {
+/* On Intel, kvm returns cpuid according to the Intel spec,
+ * so add missing bits according to the AMD spec:
+ */
+cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
 }
 
 g_free(cpuid);
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 22/28] Issue warning when deprecated drive parameter boot=on|off is used

2012-10-31 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.

Port drive boot option. From the qemu-kvm original commit message:

We do not want to maintain this option forever. It will be removed after
a grace period of a few releases. So warn the user that this option has
no effect and will become invalid soon.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 blockdev.c|6 ++
 qemu-config.c |4 
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index a068a4b..e73fd6e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -433,6 +433,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 return NULL;
 }
 
+if (qemu_opt_get(opts, boot) != NULL) {
+fprintf(stderr, qemu-kvm: boot=on|off is deprecated and will be 
+ignored. Future versions will reject this parameter. Please 
+update your scripts.\n);
+}
+
 on_write_error = BLOCKDEV_ON_ERROR_ENOSPC;
 if ((buf = qemu_opt_get(opts, werror)) != NULL) {
 if (type != IF_IDE  type != IF_SCSI  type != IF_VIRTIO  type != 
IF_NONE) {
diff --git a/qemu-config.c b/qemu-config.c
index e854fff..3154cac 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -114,6 +114,10 @@ static QemuOptsList qemu_drive_opts = {
 .name = copy-on-read,
 .type = QEMU_OPT_BOOL,
 .help = copy read data from backing file into image file,
+},{
+.name = boot,
+.type = QEMU_OPT_BOOL,
+.help = (deprecated, ignored),
 },
 { /* end of list */ }
 },
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/28] i386: kvm: x2apic is not supported without in-kernel irqchip

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

This is necessary so that x2apic is not improperly enabled when the
in-kernel irqchip is disabled.

This won't generate a warning with -cpu ...,check because the current
check/enforce code is broken (it checks the host CPU data directly,
instead of using kvm_arch_get_supported_cpuid()), but it will be
eventually fixed to properly report the missing x2apic flag.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index aae8af2..4e96b04 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -209,6 +209,13 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
 }
+
+/* x2apic is reported by GET_SUPPORTED_CPUID, but it can't be enabled
+ * without the in-kernel irqchip
+ */
+if (!kvm_irqchip_in_kernel()) {
+ret = ~CPUID_EXT_X2APIC;
+}
 } else if (function == 0x8001  reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/28] target-i386: make cpu_x86_fill_host() void

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

The return value of that function is always 0, and is always ignored.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/cpu.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 4417d6a..6575834 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -766,7 +766,7 @@ static int cpu_x86_fill_model_id(char *str)
 return 0;
 }
 
-static int cpu_x86_fill_host(x86_def_t *x86_cpu_def)
+static void cpu_x86_fill_host(x86_def_t *x86_cpu_def)
 {
 uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
 
@@ -819,8 +819,6 @@ static int cpu_x86_fill_host(x86_def_t *x86_cpu_def)
  * unsupported ones later.
  */
 x86_cpu_def-svm_features = -1;
-
-return 0;
 }
 
 static int unavailable_host_feature(struct model_features_t *f, uint32_t mask)
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/28] i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid() function

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index c94897f..d74dbc1 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -98,6 +98,19 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
 return cpuid;
 }
 
+/* Run KVM_GET_SUPPORTED_CPUID ioctl(), allocating a buffer large enough
+ * for all entries.
+ */
+static struct kvm_cpuid2 *get_supported_cpuid(KVMState *s)
+{
+struct kvm_cpuid2 *cpuid;
+int max = 1;
+while ((cpuid = try_get_cpuid(s, max)) == NULL) {
+max *= 2;
+}
+return cpuid;
+}
+
 struct kvm_para_features {
 int cap;
 int feature;
@@ -166,15 +179,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
-int max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
 bool found = false;
 
-max = 1;
-while ((cpuid = try_get_cpuid(s, max)) == NULL) {
-max *= 2;
-}
+cpuid = get_supported_cpuid(s);
 
 struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
 if (entry) {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/28] i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |   30 ++
 1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 695deb9..c94897f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -145,11 +145,28 @@ static uint32_t cpuid_entry_get_reg(struct 
kvm_cpuid_entry2 *entry, int reg)
 return ret;
 }
 
+/* Find matching entry for function/index on kvm_cpuid2 struct
+ */
+static struct kvm_cpuid_entry2 *cpuid_find_entry(struct kvm_cpuid2 *cpuid,
+ uint32_t function,
+ uint32_t index)
+{
+int i;
+for (i = 0; i  cpuid-nent; ++i) {
+if (cpuid-entries[i].function == function 
+cpuid-entries[i].index == index) {
+return cpuid-entries[i];
+}
+}
+/* not found: */
+return NULL;
+}
+
 uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
   uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
-int i, max;
+int max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
 bool found = false;
@@ -159,13 +176,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 max *= 2;
 }
 
-for (i = 0; i  cpuid-nent; ++i) {
-if (cpuid-entries[i].function == function 
-cpuid-entries[i].index == index) {
-struct kvm_cpuid_entry2 *entry = cpuid-entries[i];
-found = true;
-ret = cpuid_entry_get_reg(entry, reg);
-}
+struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
+if (entry) {
+found = true;
+ret = cpuid_entry_get_reg(entry, reg);
 }
 
 /* Fixups for the data returned by KVM, below */
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/28] target-i386: Add missing kvm cpuid feature name

2012-10-31 Thread Marcelo Tosatti
From: Don Slutz d...@cloudswitch.com

Currently -cpu host,-kvmclock,-kvm_nopiodelay,-kvm_mmu does not
turn off all bits in CPUID 0x4001 EAX.

The missing ones is KVM_FEATURE_STEAL_TIME.

This adds the name kvm_steal_time.

Signed-off-by: Don Slutz d...@cloudswitch.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/cpu.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 7c726fc..4417d6a 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -88,10 +88,14 @@ static const char *ext3_feature_name[] = {
 };
 
 static const char *kvm_feature_name[] = {
-kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock, kvm_asyncpf, NULL, 
kvm_pv_eoi, NULL,
-NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock,
+kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
 };
 
 static const char *svm_feature_name[] = {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/28] i386: kvm: mask cpuid_kvm_features earlier

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

Instead of masking the KVM feature bits very late (while building the
KVM_SET_CPUID2 data), mask it out on env-cpuid_kvm_features, at the
same point where the other feature words are masked out.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 4e96b04..8eb61a0 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -428,6 +428,9 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 env-cpuid_svm_features  = kvm_arch_get_supported_cpuid(s, 0x800A,
  0, R_EDX);
 
+env-cpuid_kvm_features =
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+
 cpuid_i = 0;
 
 /* Paravirtualization CPUIDs */
@@ -448,8 +451,7 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 c = cpuid_data.entries[cpuid_i++];
 memset(c, 0, sizeof(*c));
 c-function = KVM_CPUID_FEATURES;
-c-eax = env-cpuid_kvm_features 
-kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+c-eax = env-cpuid_kvm_features;
 
 if (hyperv_enabled()) {
 memcpy(signature, Hv#1\0\0\0\0\0\0\0\0, 12);
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/28] i386: kvm: reformat filter_features_for_kvm() code

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

Cosmetic, but it will also help to make futher patches easier to review.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/cpu.c |   28 +---
 1 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 78876f6..be16898 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1374,22 +1374,20 @@ static void filter_features_for_kvm(X86CPU *cpu)
 CPUX86State *env = cpu-env;
 KVMState *s = kvm_state;
 
-env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-
-env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-
-env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_EDX);
-env-cpuid_ext3_features = kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_ECX);
-env-cpuid_svm_features  = kvm_arch_get_supported_cpuid(s, 0x800A,
- 0, R_EDX);
-
+env-cpuid_features =
+kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+env-cpuid_ext_features =
+kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
+env-cpuid_ext2_features =
+kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_EDX);
+env-cpuid_ext3_features =
+kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
+env-cpuid_svm_features  =
+kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
 env-cpuid_kvm_features =
-kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
-
-env-cpuid_ext4_features = kvm_arch_get_supported_cpuid(s, 0xC001,
- 0, R_EDX);
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+env-cpuid_ext4_features =
+kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
 
 }
 #endif
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/28] Issue warning when deprecated -no-kvm-pit is used

2012-10-31 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.

Port -no-kvm-pit option.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 qemu-options.hx |3 +++
 vl.c|5 +
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index d813755..42e642c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2888,6 +2888,9 @@ STEXI
 Enable FIPS 140-2 compliance mode.
 ETEXI
 
+HXCOMM Deprecated (ignored)
+DEF(no-kvm-pit, HAS_ARG, QEMU_OPTION_no_kvm_pit, , QEMU_ARCH_I386)
+
 HXCOMM Deprecated by -machine kernel_irqchip=on|off property
 DEF(no-kvm-irqchip, HAS_ARG, QEMU_OPTION_no_kvm_irqchip, , QEMU_ARCH_I386)
 
diff --git a/vl.c b/vl.c
index 7d0708a..37717b4 100644
--- a/vl.c
+++ b/vl.c
@@ -3171,6 +3171,11 @@ int main(int argc, char **argv, char **envp)
 machine = machine_parse(optarg);
 }
 break;
+case QEMU_OPTION_no_kvm_pit: {
+fprintf(stderr, Warning: KVM PIT can no longer be disabled 
+separately.\n);
+break;
+}
 case QEMU_OPTION_usb:
 machine_opts = qemu_opts_find(qemu_find_opts(machine), 0);
 if (machine_opts) {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/28] i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

Now that CPUID leaf 7 features can be enabled/disabled on the
command-line, we need to filter them properly using GET_SUPPORTED_CPUID,
at the same place where other features are filtered out.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/cpu.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index be16898..22d91ed 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1384,6 +1384,8 @@ static void filter_features_for_kvm(X86CPU *cpu)
 kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
 env-cpuid_svm_features  =
 kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
+env-cpuid_7_0_ebx_features =
+kvm_arch_get_supported_cpuid(s, 7, 0, R_EBX);
 env-cpuid_kvm_features =
 kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
 env-cpuid_ext4_features =
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/28] i386: kvm: extract register switch to cpuid_entry_get_reg() function

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

No behavior change: just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |   37 +++--
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 18782e4..695deb9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -123,6 +123,28 @@ static int get_para_features(KVMState *s)
 }
 
 
+/* Returns the value for a specific register on the cpuid entry
+ */
+static uint32_t cpuid_entry_get_reg(struct kvm_cpuid_entry2 *entry, int reg)
+{
+uint32_t ret = 0;
+switch (reg) {
+case R_EAX:
+ret = entry-eax;
+break;
+case R_EBX:
+ret = entry-ebx;
+break;
+case R_ECX:
+ret = entry-ecx;
+break;
+case R_EDX:
+ret = entry-edx;
+break;
+}
+return ret;
+}
+
 uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
   uint32_t index, int reg)
 {
@@ -142,20 +164,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 cpuid-entries[i].index == index) {
 struct kvm_cpuid_entry2 *entry = cpuid-entries[i];
 found = true;
-switch (reg) {
-case R_EAX:
-ret = entry-eax;
-break;
-case R_EBX:
-ret = entry-ebx;
-break;
-case R_ECX:
-ret = entry-ecx;
-break;
-case R_EDX:
-ret = entry-edx;
-break;
-}
+ret = cpuid_entry_get_reg(entry, reg);
 }
 }
 
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/28] [PULL] qemu-kvm.git uq/master queue

2012-10-31 Thread Marcelo Tosatti
The following changes since commit aee0bf7d8d7564f8f2c40e4501695c492b7dd8d1:

  tap-win32: stubs to fix win32 build (2012-10-30 19:18:53 +)

are available in the git repository at:
  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Don Slutz (1):
  target-i386: Add missing kvm cpuid feature name

Eduardo Habkost (19):
  i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of for 
loop
  i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features check
  i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable
  i386: kvm: extract register switch to cpuid_entry_get_reg() function
  i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function
  i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid() function
  i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with single 
'if'
  i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()
  i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on 
kvm_arch_get_supported_cpuid()
  i386: kvm: x2apic is not supported without in-kernel irqchip
  i386: kvm: mask cpuid_kvm_features earlier
  i386: kvm: mask cpuid_ext4_features bits earlier
  i386: kvm: filter CPUID feature words earlier, on cpu.c
  i386: kvm: reformat filter_features_for_kvm() code
  i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too
  i386: cpu: add missing CPUID[EAX=7,ECX=0] flag names
  target-i386: make cpu_x86_fill_host() void
  target-i386: cpu: make -cpu host/check/enforce code KVM-specific
  target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID

Jan Kiszka (6):
  Use machine options to emulate -no-kvm-irqchip
  Issue warning when deprecated -no-kvm-pit is used
  Use global properties to emulate -no-kvm-pit-reinjection
  Issue warning when deprecated drive parameter boot=on|off is used
  Issue warning when deprecated -tdf option is used
  Emulate qemu-kvms -no-kvm option

Marcelo Tosatti (1):
  cirrus_vga: allow configurable vram size

Peter Maydell (1):
  update-linux-headers.sh: Handle new kernel uapi/ directories

 blockdev.c  |6 ++
 hw/cirrus_vga.c |   21 --
 kvm.h   |1 +
 qemu-config.c   |4 +
 qemu-options.hx |   16 
 scripts/update-linux-headers.sh |3 +-
 target-i386/cpu.c   |   98 +++---
 target-i386/kvm.c   |  153 +++
 vl.c|   33 +
 9 files changed, 242 insertions(+), 93 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/28] i386: cpu: add missing CPUID[EAX=7,ECX=0] flag names

2012-10-31 Thread Marcelo Tosatti
From: Eduardo Habkost ehabk...@redhat.com

This makes QEMU recognize the following CPU flag names:

 Flags| Corresponding KVM kernel commit
 -+
 FSGSBASE | 176f61da82435eae09cc96f70b530d1ba0746b8b
 AVX2, BMI1, BMI2 | fb215366b3c7320ac25dca766a0152df16534932
 HLE, RTM | 83c529151ab0d4a813e3f6a3e293fff75d468519
 INVPCID  | ad756a1603c5fac207758faaac7f01c34c9d0b7b
 ERMS | a01c8f9b4e266df1d7166d23216f2060648f862d

Reviewed-by: Don Slutz d...@cloudswitch.com
Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/cpu.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 22d91ed..7c726fc 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -106,8 +106,8 @@ static const char *svm_feature_name[] = {
 };
 
 static const char *cpuid_7_0_ebx_feature_name[] = {
-NULL, NULL, NULL, NULL, NULL, NULL, NULL, smep,
-NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+fsgsbase, NULL, NULL, bmi1, hle, avx2, NULL, smep,
+bmi2, erms, invpcid, rtm, NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL, smap, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 };
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2012-10-30

2012-10-31 Thread Alexander Graf


On 31.10.2012, at 02:32, Marcelo Tosatti mtosa...@redhat.com wrote:

 On Tue, Oct 30, 2012 at 11:02:13AM +0100, Alexander Graf wrote:
 Hi Avi / Marcelo,
 
 This is my current patch queue for ppc.  Please pull.
 
 Headline changes are:
 
  * Fix 440 target
  * Fix uapi conflict
 
 Can you regenerate against queue branch? (btw i forgot to update 
 fsl_hcalls.h...).

Hrm. So which branch am I supposed to base against? Master, next or queue? This 
one is against next...

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2012-10-30

2012-10-31 Thread Avi Kivity
On 10/31/2012 12:22 PM, Alexander Graf wrote:
 
 
 On 31.10.2012, at 02:32, Marcelo Tosatti mtosa...@redhat.com wrote:
 
 On Tue, Oct 30, 2012 at 11:02:13AM +0100, Alexander Graf wrote:
 Hi Avi / Marcelo,
 
 This is my current patch queue for ppc.  Please pull.
 
 Headline changes are:
 
  * Fix 440 target
  * Fix uapi conflict
 
 Can you regenerate against queue branch? (btw i forgot to update 
 fsl_hcalls.h...).
 
 Hrm. So which branch am I supposed to base against? Master, next or queue? 
 This one is against next...

next and queue should be compatible (queue = next + a few patches).  Are
there any conflicts when merging against queue?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 0/8] enable/disable zero copy tx dynamically

2012-10-31 Thread Michael S. Tsirkin

tun supports zero copy transmit since 0690899b4d4501b3505be069b9a687e68ccbe15b,
however you can only enable this mode if you know your workload does not
trigger heavy guest to host/host to guest traffic - otherwise you
get a (minor) performance regression.
This patchset addresses this problem by notifying the owner
device when callback is invoked because of a data copy.
This makes it possible to detect whether zero copy is appropriate
dynamically: we start in zero copy mode, when we detect
data copied we disable zero copy for a while.

With this patch applied, I get the same performance for
guest to host and guest to guest both with and without zero copy tx.

Changes from v1:
  Comment fixups in patches 2 and 8 suggested by Vlad Yasevich,
 no changes to other patches

Michael S. Tsirkin (8):
  skb: report completion status for zero copy skbs
  skb: api to report errors for zero copy skbs
  tun: report orphan frags errors to zero copy callback
  vhost-net: cleanup macros for DMA status tracking
  vhost: track zero copy failures using DMA length
  vhost: move -net specific code out
  vhost-net: select tx zero copy dynamically
  vhost-net: reduce vq polling on tx zerocopy

 drivers/net/tun.c |   1 +
 drivers/vhost/net.c   | 111 +++---
 drivers/vhost/tcm_vhost.c |   1 +
 drivers/vhost/vhost.c |  52 +++---
 drivers/vhost/vhost.h |  11 ++---
 include/linux/skbuff.h|   5 ++-
 net/core/skbuff.c |  24 +-
 7 files changed, 144 insertions(+), 61 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 2/8] skb: api to report errors for zero copy skbs

2012-10-31 Thread Michael S. Tsirkin
Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c  | 20 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8bac11b..0644432 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -568,6 +568,7 @@ static inline struct rtable *skb_rtable(const struct 
sk_buff *skb)
 }
 
 extern void kfree_skb(struct sk_buff *skb);
+extern void skb_tx_error(struct sk_buff *skb, int err);
 extern void consume_skb(struct sk_buff *skb);
 extern void   __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index eb31f6e..2e7a1fd 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -635,6 +635,26 @@ void kfree_skb(struct sk_buff *skb)
 EXPORT_SYMBOL(kfree_skb);
 
 /**
+ * skb_tx_error - report an sk_buff xmit error
+ * @skb: buffer that triggered an error
+ *
+ * Report xmit error if a device callback is tracking this skb.
+ * skb must be freed afterwards.
+ */
+void skb_tx_error(struct sk_buff *skb, int err)
+{
+   if (skb_shinfo(skb)-tx_flags  SKBTX_DEV_ZEROCOPY) {
+   struct ubuf_info *uarg;
+
+   uarg = skb_shinfo(skb)-destructor_arg;
+   if (uarg-callback)
+   uarg-callback(uarg, err);
+   skb_shinfo(skb)-tx_flags = ~SKBTX_DEV_ZEROCOPY;
+   }
+}
+EXPORT_SYMBOL(skb_tx_error);
+
+/**
  * consume_skb - free an skbuff
  * @skb: buffer to free
  *
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 4/8] vhost-net: cleanup macros for DMA status tracking

2012-10-31 Thread Michael S. Tsirkin
Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c   |  3 ++-
 drivers/vhost/vhost.c |  2 +-
 drivers/vhost/vhost.h | 12 +---
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 072cbba..f80ae5f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -237,7 +237,8 @@ static void handle_tx(struct vhost_net *net)
} else {
struct ubuf_info *ubuf = vq-ubuf_info[head];
 
-   vq-heads[vq-upend_idx].len = len;
+   vq-heads[vq-upend_idx].len =
+   VHOST_DMA_IN_PROGRESS;
ubuf-callback = vhost_zerocopy_callback;
ubuf-ctx = vq-ubufs;
ubuf-desc = vq-upend_idx;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 92308b6..906fd9f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1606,7 +1606,7 @@ void vhost_zerocopy_callback(struct ubuf_info *ubuf, int 
zerocopy_status)
struct vhost_virtqueue *vq = ubufs-vq;
 
vhost_poll_queue(vq-poll);
-   /* set len = 1 to mark this desc buffers done DMA */
+   /* set len to mark this desc buffers done DMA */
vq-heads[ubuf-desc].len = VHOST_DMA_DONE_LEN;
kref_put(ubufs-kref, vhost_zerocopy_done_signal);
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index eb7263c3..ad72a1f 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -13,9 +13,15 @@
 #include linux/virtio_ring.h
 #include linux/atomic.h
 
-/* This is for zerocopy, used buffer len is set to 1 when lower device DMA
- * done */
-#define VHOST_DMA_DONE_LEN 1
+/*
+ * For transmit, used buffer len is unused; we override it to track buffer
+ * status internally; used for zerocopy tx only.
+ */
+/* Lower device DMA done */
+#define VHOST_DMA_DONE_LEN 2
+/* Lower device DMA in progress */
+#define VHOST_DMA_IN_PROGRESS  1
+/* Buffer unused */
 #define VHOST_DMA_CLEAR_LEN0
 
 struct vhost_device;
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 5/8] vhost: track zero copy failures using DMA length

2012-10-31 Thread Michael S. Tsirkin
This will be used to disable zerocopy when error rate
is high.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.c | 7 ---
 drivers/vhost/vhost.h | 4 
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 906fd9f..5affce3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -425,7 +425,7 @@ int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
int j = 0;
 
for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if ((vq-heads[i].len == VHOST_DMA_DONE_LEN)) {
+   if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
vhost_add_used_and_signal(vq-dev, vq,
  vq-heads[i].id, 0);
@@ -1600,13 +1600,14 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref 
*ubufs)
kfree(ubufs);
 }
 
-void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status)
+void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status)
 {
struct vhost_ubuf_ref *ubufs = ubuf-ctx;
struct vhost_virtqueue *vq = ubufs-vq;
 
vhost_poll_queue(vq-poll);
/* set len to mark this desc buffers done DMA */
-   vq-heads[ubuf-desc].len = VHOST_DMA_DONE_LEN;
+   vq-heads[ubuf-desc].len = status ?
+   VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
kref_put(ubufs-kref, vhost_zerocopy_done_signal);
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index ad72a1f..6fdf31d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -17,6 +17,8 @@
  * For transmit, used buffer len is unused; we override it to track buffer
  * status internally; used for zerocopy tx only.
  */
+/* Lower device DMA failed */
+#define VHOST_DMA_FAILED_LEN   3
 /* Lower device DMA done */
 #define VHOST_DMA_DONE_LEN 2
 /* Lower device DMA in progress */
@@ -24,6 +26,8 @@
 /* Buffer unused */
 #define VHOST_DMA_CLEAR_LEN0
 
+#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
+
 struct vhost_device;
 
 struct vhost_work;
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 6/8] vhost: move -net specific code out

2012-10-31 Thread Michael S. Tsirkin
Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c   | 45 
 drivers/vhost/tcm_vhost.c |  1 +
 drivers/vhost/vhost.c | 53 +++
 drivers/vhost/vhost.h | 21 +++
 4 files changed, 56 insertions(+), 64 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f80ae5f..532fc88 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -126,6 +126,42 @@ static void tx_poll_start(struct vhost_net *net, struct 
socket *sock)
net-tx_poll_state = VHOST_NET_POLL_STARTED;
 }
 
+/* In case of DMA done not in order in lower device driver for some reason.
+ * upend_idx is used to track end of used idx, done_idx is used to track head
+ * of used idx. Once lower device DMA done contiguously, we will signal KVM
+ * guest used idx.
+ */
+int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+{
+   int i;
+   int j = 0;
+
+   for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
+   if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
+   vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
+   vhost_add_used_and_signal(vq-dev, vq,
+ vq-heads[i].id, 0);
+   ++j;
+   } else
+   break;
+   }
+   if (j)
+   vq-done_idx = i;
+   return j;
+}
+
+static void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status)
+{
+   struct vhost_ubuf_ref *ubufs = ubuf-ctx;
+   struct vhost_virtqueue *vq = ubufs-vq;
+
+   vhost_poll_queue(vq-poll);
+   /* set len to mark this desc buffers done DMA */
+   vq-heads[ubuf-desc].len = status ?
+   VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
+   vhost_ubuf_put(ubufs);
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -594,9 +630,18 @@ static int vhost_net_release(struct inode *inode, struct 
file *f)
struct vhost_net *n = f-private_data;
struct socket *tx_sock;
struct socket *rx_sock;
+   int i;
 
vhost_net_stop(n, tx_sock, rx_sock);
vhost_net_flush(n);
+   vhost_dev_stop(n-dev);
+   for (i = 0; i  n-dev.nvqs; ++i) {
+   /* Wait for all lower device DMAs done. */
+   if (n-dev.vqs[i].ubufs)
+   vhost_ubuf_put_and_wait(n-dev.vqs[i].ubufs);
+
+   vhost_zerocopy_signal_used(n, n-dev.vqs[i]);
+   }
vhost_dev_cleanup(n-dev, false);
if (tx_sock)
fput(tx_sock-file);
diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
index aa31692..23c138f 100644
--- a/drivers/vhost/tcm_vhost.c
+++ b/drivers/vhost/tcm_vhost.c
@@ -895,6 +895,7 @@ static int vhost_scsi_release(struct inode *inode, struct 
file *f)
vhost_scsi_clear_endpoint(s, backend);
}
 
+   vhost_dev_stop(s-dev);
vhost_dev_cleanup(s-dev, false);
kfree(s);
return 0;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5affce3..ef8f598 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -26,10 +26,6 @@
 #include linux/kthread.h
 #include linux/cgroup.h
 
-#include linux/net.h
-#include linux/if_packet.h
-#include linux/if_arp.h
-
 #include vhost.h
 
 enum {
@@ -414,28 +410,16 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
return 0;
 }
 
-/* In case of DMA done not in order in lower device driver for some reason.
- * upend_idx is used to track end of used idx, done_idx is used to track head
- * of used idx. Once lower device DMA done contiguously, we will signal KVM
- * guest used idx.
- */
-int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+void vhost_dev_stop(struct vhost_dev *dev)
 {
int i;
-   int j = 0;
-
-   for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
-   vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
-   vhost_add_used_and_signal(vq-dev, vq,
- vq-heads[i].id, 0);
-   ++j;
-   } else
-   break;
+
+   for (i = 0; i  dev-nvqs; ++i) {
+   if (dev-vqs[i].kick  dev-vqs[i].handle_kick) {
+   vhost_poll_stop(dev-vqs[i].poll);
+   vhost_poll_flush(dev-vqs[i].poll);
+   }
}
-   if (j)
-   vq-done_idx = i;
-   return j;
 }
 
 /* Caller should have device mutex if and only if locked is set */
@@ -444,17 +428,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
int i;
 
for (i = 0; 

[PATCHv2 net-next 7/8] vhost-net: select tx zero copy dynamically

2012-10-31 Thread Michael S. Tsirkin
Even when vhost-net is in zero-copy transmit mode,
net core might still decide to copy the skb later
which is somewhat slower than a copy in user
context: data copy overhead is added to the cost of
page pin/unpin. The result is that enabling tx zero copy
option leads to higher CPU utilization for guest to guest
and guest to host traffic.

To fix this, suppress zero copy tx after a given number of
packets triggered late data copy. Re-enable periodically
to detect workload changes.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c | 55 -
 1 file changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 532fc88..8e9de79 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -42,6 +42,21 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Experimental 
Zero Copy TX);
 #define VHOST_MAX_PEND 128
 #define VHOST_GOODCOPY_LEN 256
 
+/*
+ * For transmit, used buffer len is unused; we override it to track buffer
+ * status internally; used for zerocopy tx only.
+ */
+/* Lower device DMA failed */
+#define VHOST_DMA_FAILED_LEN   3
+/* Lower device DMA done */
+#define VHOST_DMA_DONE_LEN 2
+/* Lower device DMA in progress */
+#define VHOST_DMA_IN_PROGRESS  1
+/* Buffer unused */
+#define VHOST_DMA_CLEAR_LEN0
+
+#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
+
 enum {
VHOST_NET_VQ_RX = 0,
VHOST_NET_VQ_TX = 1,
@@ -62,8 +77,33 @@ struct vhost_net {
 * We only do this when socket buffer fills up.
 * Protected by tx vq lock. */
enum vhost_net_poll_state tx_poll_state;
+   /* Number of TX recently submitted.
+* Protected by tx vq lock. */
+   unsigned tx_packets;
+   /* Number of times zerocopy TX recently failed.
+* Protected by tx vq lock. */
+   unsigned tx_zcopy_err;
 };
 
+static void vhost_net_tx_packet(struct vhost_net *net)
+{
+   ++net-tx_packets;
+   if (net-tx_packets  1024)
+   return;
+   net-tx_packets = 0;
+   net-tx_zcopy_err = 0;
+}
+
+static void vhost_net_tx_err(struct vhost_net *net)
+{
+   ++net-tx_zcopy_err;
+}
+
+static bool vhost_net_tx_select_zcopy(struct vhost_net *net)
+{
+   return net-tx_packets / 64 = net-tx_zcopy_err;
+}
+
 static bool vhost_sock_zcopy(struct socket *sock)
 {
return unlikely(experimental_zcopytx) 
@@ -131,12 +171,15 @@ static void tx_poll_start(struct vhost_net *net, struct 
socket *sock)
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
  * guest used idx.
  */
-int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+static int vhost_zerocopy_signal_used(struct vhost_net *net,
+ struct vhost_virtqueue *vq)
 {
int i;
int j = 0;
 
for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
+   if (vq-heads[i].len == VHOST_DMA_FAILED_LEN)
+   vhost_net_tx_err(net);
if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
vhost_add_used_and_signal(vq-dev, vq,
@@ -208,7 +251,7 @@ static void handle_tx(struct vhost_net *net)
for (;;) {
/* Release DMAs done buffers first */
if (zcopy)
-   vhost_zerocopy_signal_used(vq);
+   vhost_zerocopy_signal_used(net, vq);
 
head = vhost_get_vq_desc(net-dev, vq, vq-iov,
 ARRAY_SIZE(vq-iov),
@@ -263,7 +306,8 @@ static void handle_tx(struct vhost_net *net)
/* use msg_control to pass vhost zerocopy ubuf info to skb */
if (zcopy) {
vq-heads[vq-upend_idx].id = head;
-   if (len  VHOST_GOODCOPY_LEN) {
+   if (!vhost_net_tx_select_zcopy(net) ||
+   len  VHOST_GOODCOPY_LEN) {
/* copy don't need to wait for DMA done */
vq-heads[vq-upend_idx].len =
VHOST_DMA_DONE_LEN;
@@ -305,8 +349,9 @@ static void handle_tx(struct vhost_net *net)
if (!zcopy)
vhost_add_used_and_signal(net-dev, vq, head, 0);
else
-   vhost_zerocopy_signal_used(vq);
+   vhost_zerocopy_signal_used(net, vq);
total_len += len;
+   vhost_net_tx_packet(net);
if (unlikely(total_len = VHOST_NET_WEIGHT)) {
vhost_poll_queue(vq-poll);
break;
@@ -774,7 +819,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
if (oldubufs) {
vhost_ubuf_put_and_wait(oldubufs);
mutex_lock(vq-mutex);
-   

Re: [rfc net-next v6 0/3] Multiqueue virtio-net

2012-10-31 Thread Jason Wang

On 10/31/2012 03:05 AM, Rick Jones wrote:

On 10/30/2012 03:03 AM, Jason Wang wrote:

Hi all:

This series is an update version of multiqueue virtio-net driver 
based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to 
do the

packets reception and transmission. Please review and comments.

Changes from v5:
- Align the implementation with the RFC spec update v4
- Switch the mode between single mode and multiqueue mode without reset
- Remove the 256 limitation of queues
- Use helpers to do the mapping between virtqueues and tx/rx queues
- Use commbined channels instead of separated rx/tx queus when do the 
queue

number configuartion
- Other coding style comments from Michael

Reference:
- A protype implementation of qemu-kvm support could by found in
git://github.com/jasowang/qemu-kvm-mq.git
- V5 could be found at http://lwn.net/Articles/505388/
- V4 could be found at https://lkml.org/lkml/2012/6/25/120
- V2 could be found at http://lwn.net/Articles/467283/
- Michael virtio-spec: 
http://www.spinics.net/lists/netdev/msg209986.html


Perf Numbers:

- Pktgen test shows the receiving capability of the multiqueue 
virtio-net were

   dramatically improved.
- Netperf result shows latency were greately improved according to 
the test

result.


I suppose it is technically correct to say that latency was improved, 
but usually for aggregate request/response tests I tend to talk about 
the aggregate transactions per second.


Sure.


Do you have a hypothesis as to why the improvement dropped going to 20 
concurrent sessions from 10?


rick jones 


I'm investigating this issuse currently, but with no much ideas. The 
aggregate transactions per second scales pretty well even with 20 
cocurrent sessions when doing test between a local host and a local vm. 
Looks like some bottleneck were reached when doing testing over 10gb or 
vms as even if I increase the number of sessions, the result would not 
increase.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.7-rc2 build failure on s390x

2012-10-31 Thread Alexander Graf


On 23.10.2012, at 08:20, Christian Borntraeger borntrae...@de.ibm.com wrote:

 On 22/10/12 23:14, Alexander Graf wrote:
 Hi Christian,
 
 During our normal Factory kernel builds, s390x seems to choke:
 
 /home/abuild/rpmbuild/BUILD/kernel-default-3.7.rc2/linux-3.7-rc2/arch/s390/include/asm/kvm_para.h:147:99:
  error: redefinition of 'kvm_arch_para_features'
 /home/abuild/rpmbuild/BUILD/kernel-default-3.7.rc2/linux-3.7-rc2/arch/s390/include/asm/kvm_para.h:152:91:
  error: redefinition of 'kvm_check_and_clear_guest_paused'
 /home/abuild/rpmbuild/BUILD/kernel-default-3.7.rc2/linux-3.7-rc2/include/asm-generic/kvm_para.h:11:91:
  note: previous definition of 'kvm_check_and_clear_guest_paused' was here
 make[4]: *** [arch/s390/kvm/../../../virt/kvm/kvm_main.o] Error 1
 make[3]: *** [arch/s390/kvm] Error 2
 make[3]: *** Waiting for unfinished jobs
 
 Any idea what the culprit for this could be?
 
 
 Thanks,
 
 Alex
 
 
 Known problem:
 
 http://www.gossamer-threads.com/lists/linux/kernel/1615714?do=post_view_threaded

Still broken in -rc3.

Alex

 
 Christian
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 8/8] vhost-net: reduce vq polling on tx zerocopy

2012-10-31 Thread Michael S. Tsirkin
It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This is faster than
c70aa540c7a9f67add11ad3161096fb95233aa2e.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8e9de79..88beedb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -197,8 +197,18 @@ static void vhost_zerocopy_callback(struct ubuf_info 
*ubuf, int status)
 {
struct vhost_ubuf_ref *ubufs = ubuf-ctx;
struct vhost_virtqueue *vq = ubufs-vq;
-
-   vhost_poll_queue(vq-poll);
+   int cnt = atomic_read(ubufs-kref.refcount);
+
+   /*
+* Trigger polling thread if guest stopped submitting new buffers:
+* in this case, the refcount after decrement will eventually reach 1
+* so here it is 2.
+* We also trigger polling periodically after each 16 packets
+* (the value 16 here is more or less arbitrary, it's tuned to trigger
+* less than 10% of times).
+*/
+   if (cnt = 2 || !(cnt % 16))
+   vhost_poll_queue(vq-poll);
/* set len to mark this desc buffers done DMA */
vq-heads[ubuf-desc].len = status ?
VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs

2012-10-31 Thread Michael S. Tsirkin
Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.c  | 2 +-
 drivers/vhost/vhost.h  | 2 +-
 include/linux/skbuff.h | 4 +++-
 net/core/skbuff.c  | 4 ++--
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 99ac2cb..92308b6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1600,7 +1600,7 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
kfree(ubufs);
 }
 
-void vhost_zerocopy_callback(struct ubuf_info *ubuf)
+void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status)
 {
struct vhost_ubuf_ref *ubufs = ubuf-ctx;
struct vhost_virtqueue *vq = ubufs-vq;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 1125af3..eb7263c3 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -191,7 +191,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct 
vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
unsigned int log_num, u64 len);
-void vhost_zerocopy_callback(struct ubuf_info *);
+void vhost_zerocopy_callback(struct ubuf_info *, int);
 int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
 
 #define vq_err(vq, fmt, ...) do {  \
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6a2c34e..8bac11b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -235,11 +235,13 @@ enum {
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
+ * The zerocopy_status argument is 0 if zero copy transmit occurred,
+ * 1 on successful data copy;  0 on out of memory error.
  * The ctx field is used to track device context.
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-   void (*callback)(struct ubuf_info *);
+   void (*callback)(struct ubuf_info *, int zerocopy_status);
void *ctx;
unsigned long desc;
 };
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6e04b1f..eb31f6e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -519,7 +519,7 @@ static void skb_release_data(struct sk_buff *skb)
 
uarg = skb_shinfo(skb)-destructor_arg;
if (uarg-callback)
-   uarg-callback(uarg);
+   uarg-callback(uarg, 0);
}
 
if (skb_has_frag_list(skb))
@@ -797,7 +797,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
for (i = 0; i  num_frags; i++)
skb_frag_unref(skb, i);
 
-   uarg-callback(uarg);
+   uarg-callback(uarg, 1);
 
/* skb frags point to kernel buffers */
for (i = num_frags - 1; i = 0; i--) {
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 3/8] tun: report orphan frags errors to zero copy callback

2012-10-31 Thread Michael S. Tsirkin
When tun transmits a zero copy skb, it orphans the frags
which might need to allocate extra memory, in atomic context.
If that fails, notify ubufs callback before freeing the skb
as a hint that device should disable zerocopy mode.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3157519..613f826 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -433,6 +433,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
 drop:
dev-stats.tx_dropped++;
+   skb_tx_error(skb, -ENOMEM);
kfree_skb(skb);
return NETDEV_TX_OK;
 }
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2012-10-30

2012-10-31 Thread Alexander Graf


On 31.10.2012, at 11:26, Avi Kivity a...@redhat.com wrote:

 On 10/31/2012 12:22 PM, Alexander Graf wrote:
 
 
 On 31.10.2012, at 02:32, Marcelo Tosatti mtosa...@redhat.com wrote:
 
 On Tue, Oct 30, 2012 at 11:02:13AM +0100, Alexander Graf wrote:
 Hi Avi / Marcelo,
 
 This is my current patch queue for ppc.  Please pull.
 
 Headline changes are:
 
 * Fix 440 target
 * Fix uapi conflict
 
 Can you regenerate against queue branch? (btw i forgot to update 
 fsl_hcalls.h...).
 
 Hrm. So which branch am I supposed to base against? Master, next or queue? 
 This one is against next...
 
 next and queue should be compatible (queue = next + a few patches).  Are
 there any conflicts when merging against queue?

I merely want to know which one I should work against. So far I was under the 
impression that next is the one.

Alex

 
 
 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 00/19] ACPI memory hotplug

2012-10-31 Thread Stefan Hajnoczi
On Fri, Sep 21, 2012 at 1:17 PM, Vasilis Liaskovitis
vasilis.liaskovi...@profitbricks.com wrote:
 This is v3 of the ACPI memory hotplug functionality. Only x86_64 target is 
 supported
 for now.

Hi Vasilis,
Regarding the hot unplug issue we've been discussing, it's possible to
progress this patch series without fully solving that problem upfront.

Karen Noel suggested that the series could be rolled without the hot
unplug command, so that it's not possible to hit the unsafe case.
This would allow users to hot plug additional memory.  They would have
to use virtio-balloon to reduce the memory footprint again.  Later,
when the memory region referencing issue has been solved the hot
unplug command can be added.

Just wanted to mention Karen's idea in case you feel stuck right now.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v3 05/19] Implement dimm device abstraction

2012-10-31 Thread Avi Kivity
On 10/24/2012 10:06 AM, liu ping fan wrote:
 On Tue, Oct 23, 2012 at 8:25 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Sep 21, 2012 at 01:17:21PM +0200, Vasilis Liaskovitis wrote:
 +static void dimm_populate(DimmDevice *s)
 +{
 +DeviceState *dev= (DeviceState*)s;
 +MemoryRegion *new = NULL;
 +
 +new = g_malloc(sizeof(MemoryRegion));
 +memory_region_init_ram(new, dev-id, s-size);
 +vmstate_register_ram_global(new);
 +memory_region_add_subregion(get_system_memory(), s-start, new);
 +s-mr = new;
 +}
 +
 +static void dimm_depopulate(DimmDevice *s)
 +{
 +assert(s);
 +vmstate_unregister_ram(s-mr, NULL);
 +memory_region_del_subregion(get_system_memory(), s-mr);
 +memory_region_destroy(s-mr);
 +s-mr = NULL;
 +}

 How is dimm hot unplug protected against callers who currently have RAM
 mapped (from cpu_physical_memory_map())?

 Emulated devices call cpu_physical_memory_map() directly or indirectly
 through DMA emulation code.  The RAM pointer may be held for arbitrary
 lengths of time, across main loop iterations, etc.

 It's not clear to me that it is safe to unplug a DIMM that has network
 or disk I/O buffers, for example.  We also need to be robust against
 malicious guests who abuse the hotplug lifecycle.  QEMU should never be
 left with dangling pointers.

 Not sure about the block layer. But I think those thread are already
 out of big lock, so there should be a MemoryListener to catch the
 RAM-unplug event, and if needed, bdrv_flush.


IMO we should use the same mechanism as proposed for other devices:
address_space_map() should grab a reference on the dimm device, and
address_space_unmap() can release it.  This way device destruction will
be deferred as soon as all devices complete I/O.

We will have to be careful with network receive buffers though, since
they can be held indefinitely.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 00/19] ACPI memory hotplug

2012-10-31 Thread Avi Kivity
On 10/31/2012 12:58 PM, Stefan Hajnoczi wrote:
 On Fri, Sep 21, 2012 at 1:17 PM, Vasilis Liaskovitis
 vasilis.liaskovi...@profitbricks.com wrote:
 This is v3 of the ACPI memory hotplug functionality. Only x86_64 target is 
 supported
 for now.
 
 Hi Vasilis,
 Regarding the hot unplug issue we've been discussing, it's possible to
 progress this patch series without fully solving that problem upfront.
 
 Karen Noel suggested that the series could be rolled without the hot
 unplug command, so that it's not possible to hit the unsafe case.
 This would allow users to hot plug additional memory.  They would have
 to use virtio-balloon to reduce the memory footprint again.  Later,
 when the memory region referencing issue has been solved the hot
 unplug command can be added.
 
 Just wanted to mention Karen's idea in case you feel stuck right now.

We could introduce hotunplug as an experimental feature so people can
test and play with it, and later graduate it to a fully supported feature.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2012-10-30

2012-10-31 Thread Avi Kivity
On 10/31/2012 12:34 PM, Alexander Graf wrote:
 
 
 On 31.10.2012, at 11:26, Avi Kivity a...@redhat.com wrote:
 
 On 10/31/2012 12:22 PM, Alexander Graf wrote:
 
 
 On 31.10.2012, at 02:32, Marcelo Tosatti mtosa...@redhat.com wrote:
 
 On Tue, Oct 30, 2012 at 11:02:13AM +0100, Alexander Graf wrote:
 Hi Avi / Marcelo,
 
 This is my current patch queue for ppc.  Please pull.
 
 Headline changes are:
 
 * Fix 440 target
 * Fix uapi conflict
 
 Can you regenerate against queue branch? (btw i forgot to update 
 fsl_hcalls.h...).
 
 Hrm. So which branch am I supposed to base against? Master, next or queue? 
 This one is against next...
 
 next and queue should be compatible (queue = next + a few patches).  Are
 there any conflicts when merging against queue?
 
 I merely want to know which one I should work against. So far I was under the 
 impression that next is the one.

It is.  queue might be rebased at times.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2012-10-30

2012-10-31 Thread Marcelo Tosatti
On Wed, Oct 31, 2012 at 12:26:04PM +0200, Avi Kivity wrote:
 On 10/31/2012 12:22 PM, Alexander Graf wrote:
  
  
  On 31.10.2012, at 02:32, Marcelo Tosatti mtosa...@redhat.com wrote:
  
  On Tue, Oct 30, 2012 at 11:02:13AM +0100, Alexander Graf wrote:
  Hi Avi / Marcelo,
  
  This is my current patch queue for ppc.  Please pull.
  
  Headline changes are:
  
   * Fix 440 target
   * Fix uapi conflict
  
  Can you regenerate against queue branch? (btw i forgot to update 
  fsl_hcalls.h...).
  
  Hrm. So which branch am I supposed to base against? Master, next or queue? 
  This one is against next...
 
 next and queue should be compatible (queue = next + a few patches).  Are
 there any conflicts when merging against queue?

There will be, and fsl_hcalls.h needs fixup (better Alexander verify 
i did not screw up anything else).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2012-10-30

2012-10-31 Thread Marcelo Tosatti
On Wed, Oct 31, 2012 at 11:22:55AM +0100, Alexander Graf wrote:
 
 
 On 31.10.2012, at 02:32, Marcelo Tosatti mtosa...@redhat.com wrote:
 
  On Tue, Oct 30, 2012 at 11:02:13AM +0100, Alexander Graf wrote:
  Hi Avi / Marcelo,
  
  This is my current patch queue for ppc.  Please pull.
  
  Headline changes are:
  
   * Fix 440 target
   * Fix uapi conflict
  
  Can you regenerate against queue branch? (btw i forgot to update 
  fsl_hcalls.h...).
 
 Hrm. So which branch am I supposed to base against? Master, next or queue? 
 This one is against next...
 
 Alex

queue

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux-next changes for module and virtio trees.

2012-10-31 Thread Stephen Rothwell
Hi Rusty,

On Wed, 31 Oct 2012 13:58:15 +1030 Rusty Russell ru...@rustcorp.com.au wrote:

 Stephen Rothwell s...@canb.auug.org.au writes:
 
  On Tue, 02 Oct 2012 15:56:56 +0930 Rusty Russell ru...@rustcorp.com.au 
  wrote:
 
  Please remove my quilt tree
  http://ozlabs.org/~rusty/kernel/rr-latest/ from linux-next, and use my
  git trees from now on:
  
  git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git
  Branches:
  modules-next
  virtio-next
 
  Done.
 
 And please add my 'fixes' branch from the same tree.

Done.

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgment of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees.  You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next.  These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc.  The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc.  If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.


pgpki0dAZ6HMu.pgp
Description: PGP signature


Re: [Qemu-devel] [RFC PATCH v3 05/19] Implement dimm device abstraction

2012-10-31 Thread Stefan Hajnoczi
On Wed, Oct 31, 2012 at 12:15 PM, Avi Kivity a...@redhat.com wrote:
 On 10/24/2012 10:06 AM, liu ping fan wrote:
 On Tue, Oct 23, 2012 at 8:25 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Sep 21, 2012 at 01:17:21PM +0200, Vasilis Liaskovitis wrote:
 +static void dimm_populate(DimmDevice *s)
 +{
 +DeviceState *dev= (DeviceState*)s;
 +MemoryRegion *new = NULL;
 +
 +new = g_malloc(sizeof(MemoryRegion));
 +memory_region_init_ram(new, dev-id, s-size);
 +vmstate_register_ram_global(new);
 +memory_region_add_subregion(get_system_memory(), s-start, new);
 +s-mr = new;
 +}
 +
 +static void dimm_depopulate(DimmDevice *s)
 +{
 +assert(s);
 +vmstate_unregister_ram(s-mr, NULL);
 +memory_region_del_subregion(get_system_memory(), s-mr);
 +memory_region_destroy(s-mr);
 +s-mr = NULL;
 +}

 How is dimm hot unplug protected against callers who currently have RAM
 mapped (from cpu_physical_memory_map())?

 Emulated devices call cpu_physical_memory_map() directly or indirectly
 through DMA emulation code.  The RAM pointer may be held for arbitrary
 lengths of time, across main loop iterations, etc.

 It's not clear to me that it is safe to unplug a DIMM that has network
 or disk I/O buffers, for example.  We also need to be robust against
 malicious guests who abuse the hotplug lifecycle.  QEMU should never be
 left with dangling pointers.

 Not sure about the block layer. But I think those thread are already
 out of big lock, so there should be a MemoryListener to catch the
 RAM-unplug event, and if needed, bdrv_flush.


 IMO we should use the same mechanism as proposed for other devices:
 address_space_map() should grab a reference on the dimm device, and
 address_space_unmap() can release it.  This way device destruction will
 be deferred as soon as all devices complete I/O.

 We will have to be careful with network receive buffers though, since
 they can be held indefinitely.

Network receive buffers aren't mapped.  Net receive is not zero-copy.
For example, virtio-net does virtqueue_pop() inside
virtio_net_receive().

I don't see a problem with networking.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly

2012-10-31 Thread Raghavendra K T

On 10/30/2012 02:37 PM, Andrew Jones wrote:

On Tue, Oct 30, 2012 at 01:01:54PM +0530, Raghavendra K T wrote:

On 10/30/2012 12:04 PM, Andrew Jones wrote:

On Tue, Oct 30, 2012 at 11:27:52AM +0530, Raghavendra K T wrote:

On 10/29/2012 11:24 PM, Peter Zijlstra wrote:

On Mon, 2012-10-29 at 19:37 +0530, Raghavendra K T wrote:

+/*
+ * A load of 2048 corresponds to 1:1 overcommit
+ * undercommit threshold is half the 1:1 overcommit
+ * overcommit threshold is 1.75 times of 1:1 overcommit threshold
+ */
+#define COMMIT_THRESHOLD (FIXED_1)
+#define UNDERCOMMIT_THRESHOLD (COMMIT_THRESHOLD  1)
+#define OVERCOMMIT_THRESHOLD ((COMMIT_THRESHOLD  1) -
(COMMIT_THRESHOLD  2))
+
+unsigned long kvm_system_load(void)
+{
+   unsigned long load;
+
+   load = avenrun[0] + FIXED_1/200;
+   load = load / num_online_cpus();
+
+   return load;
+}


ARGH.. no that's wrong.. very wrong.

  1) avenrun[] EXPORT_SYMBOL says it should be removed, that's not a
joke.


Okay.


  2) avenrun[] is a global load, do not ever use a global load measure


This makes sense. Using a local optimization that leads to near global
optimization is the way to go.



  3) avenrun[] has nothing what so ever to do with runqueue lengths,
someone with a gazillion tasks in D state will get a huge load but the
cpu is very idle.



I used loadavg as an alternative measure. But the above condition
poses a concern for that.

Okay, now IIUC, usage of *any* global measure is bad?

Because I was also thinking to use nrrunning()/ num_online_cpus(), to
get an idea of global overcommit sense. (ofcourse since, this involves
iteration over per CPU nrrunning, I wanted to calculate this
periodically)

The overall logic, of having overcommit_threshold,
undercommit_threshold, I wanted to use for even dynamic ple_window
tuning purpose.

so logic was:
 undercommit_threshold = 16k ple_window

overcommit_threshold  = 4k window.

for in between case scale the ple_window accordingly.

The alternative was to decide depending on how ple handler succeeded in
yield_to. But I thought, that is too sensitive and more overhead.

This topic may deserve different thread, but thought I shall table it here.

So, Thinking about the alternatives to implement, logic such as

(a) if(undercommitted)
 just go back and spin rather than going for yield_to iteration.
(b) if (overcommitted)
better to yield rather than  spinning logic

of current patches..

[ ofcourse, (a) is already met to large extent by your patches..]

So I think everything boils down to

how do we measure these two thresholds without much overhead in a
compliant way

Ideas welcome..



What happened to Avi's preempt notifier idea for determining
under/overcommit? If nobody has picked that up yet, then I'll go ahead and
try to prototype it.


Hi Drew,

I had assumed my priority order as
1) this patch series 2) dynamic ple window 3) preempt notifiers.

But I do not have any problem on re-prioritizing / helping on these
as far as we are clear on what we are looking into.

I was thinking about preempt notifier idea as a tool to refine
candidate VCPUs. But you are right, Avi, also told we can use
bitmap/counter itself as an indicator to decide whether we go ahead
with yield_to at all.

IMO, only patch(3) has some conflict because of various approach we can
try.May be we should attack the problem via all 3 solutions at once and
decide?

To be frank, within each of the approach, trying/analyzing all the
  possibilities made the things slow.. (my end).

Suggestions..?



I agree, it's a complex problem that needs lots of trial+error work. We
should definitely work in parallel on multiple ideas. I'll go ahead and
dig into the preempt notifiers.



Okay. Thank you. I will concentrate on dynamic_ple window.. But I think
implementation need some overlapping details from preempt notifier.

For dynamic ple window, To summarize, what we thought of
doing,

( I hope we have to keep the ple window between 4k - 16k throughout)

From preempt notifiers:

(1) from the preempt notifier check the overcommit case, if so increase 
the ple window

questions:
How do we say we are overcommitted?
 - is it number of preemption we keep track vs total vcpus. I think so.
But we have to convert into some formula.. we shall decrease the ple 
window by some factor (unless we hit 4k)


(2) How can say we are undercommitted:
   Perhaps there is very less number of vcpus that are scheduled out 
currently. we tend to set ple window closer to max (16k).


From yield_to failures:

if yield_to fails with ESRCH, it potentially indicate undercommit and
we can again use logic of increasing ple window.

Did we miss anything?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v3 05/19] Implement dimm device abstraction

2012-10-31 Thread Avi Kivity
On 10/31/2012 02:18 PM, Stefan Hajnoczi wrote:

 IMO we should use the same mechanism as proposed for other devices:
 address_space_map() should grab a reference on the dimm device, and
 address_space_unmap() can release it.  This way device destruction will
 be deferred as soon as all devices complete I/O.

 We will have to be careful with network receive buffers though, since
 they can be held indefinitely.
 
 Network receive buffers aren't mapped.  Net receive is not zero-copy.
 For example, virtio-net does virtqueue_pop() inside
 virtio_net_receive().
 
 I don't see a problem with networking.

What about vhost-net?  But that is managed separately with a MemoryListener.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v3 05/19] Implement dimm device abstraction

2012-10-31 Thread Stefan Hajnoczi
On Wed, Oct 31, 2012 at 1:34 PM, Avi Kivity a...@redhat.com wrote:
 On 10/31/2012 02:18 PM, Stefan Hajnoczi wrote:

 IMO we should use the same mechanism as proposed for other devices:
 address_space_map() should grab a reference on the dimm device, and
 address_space_unmap() can release it.  This way device destruction will
 be deferred as soon as all devices complete I/O.

 We will have to be careful with network receive buffers though, since
 they can be held indefinitely.

 Network receive buffers aren't mapped.  Net receive is not zero-copy.
 For example, virtio-net does virtqueue_pop() inside
 virtio_net_receive().

 I don't see a problem with networking.

 What about vhost-net?  But that is managed separately with a MemoryListener.

Yep.  It should find out when memory regions change through its listener.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case

2012-10-31 Thread Avi Kivity
On 10/29/2012 04:07 PM, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 Also we do not update last boosted vcpu in failure cases.
 
  #endif
 +
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
   struct kvm *kvm = me-kvm;
 @@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   continue;
   if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
   continue;
 - if (kvm_vcpu_yield_to(vcpu)) {
 +
 + yielded = kvm_vcpu_yield_to(vcpu);
 + if (yielded  0)
   kvm-last_boosted_vcpu = i;
 - yielded = 1;
 + if (yielded)
   break;
 - }
   }

If yielded == -ESRCH, should we not try to yield to another vcpu?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case

2012-10-31 Thread Raghavendra K T

On 10/31/2012 06:08 PM, Avi Kivity wrote:

On 10/29/2012 04:07 PM, Raghavendra K T wrote:

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Also we do not update last boosted vcpu in failure cases.

  #endif
+
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
struct kvm *kvm = me-kvm;
@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+   yielded = kvm_vcpu_yield_to(vcpu);
+   if (yielded  0)
kvm-last_boosted_vcpu = i;
-   yielded = 1;
+   if (yielded)
break;
-   }
}


If yielded == -ESRCH, should we not try to yield to another vcpu?



 Yes. plan is to abort the iteration. since it means we are mostly 
undercommitted.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2012-10-30

2012-10-31 Thread Alexander Graf

On 31.10.2012, at 12:25, Marcelo Tosatti wrote:

 On Wed, Oct 31, 2012 at 12:26:04PM +0200, Avi Kivity wrote:
 On 10/31/2012 12:22 PM, Alexander Graf wrote:
 
 
 On 31.10.2012, at 02:32, Marcelo Tosatti mtosa...@redhat.com wrote:
 
 On Tue, Oct 30, 2012 at 11:02:13AM +0100, Alexander Graf wrote:
 Hi Avi / Marcelo,
 
 This is my current patch queue for ppc.  Please pull.
 
 Headline changes are:
 
 * Fix 440 target
 * Fix uapi conflict
 
 Can you regenerate against queue branch? (btw i forgot to update 
 fsl_hcalls.h...).
 
 Hrm. So which branch am I supposed to base against? Master, next or queue? 
 This one is against next...
 
 next and queue should be compatible (queue = next + a few patches).  Are
 there any conflicts when merging against queue?
 
 There will be, and fsl_hcalls.h needs fixup (better Alexander verify 
 i did not screw up anything else).

I pushed a tree where I merged for-upstream and your queue and hopefully fixed 
up the whole mess. It's available as

  for-queue

on my github tree.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case

2012-10-31 Thread Raghavendra K T

On 10/31/2012 06:11 PM, Raghavendra K T wrote:

On 10/31/2012 06:08 PM, Avi Kivity wrote:

On 10/29/2012 04:07 PM, Raghavendra K T wrote:

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Also we do not update last boosted vcpu in failure cases.

  #endif
+
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
  struct kvm *kvm = me-kvm;
@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  continue;
  if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
  continue;
-if (kvm_vcpu_yield_to(vcpu)) {
+
+yielded = kvm_vcpu_yield_to(vcpu);
+if (yielded  0)
  kvm-last_boosted_vcpu = i;
-yielded = 1;
+if (yielded)
  break;
-}
  }


If yielded == -ESRCH, should we not try to yield to another vcpu?



  Yes. plan is to abort the iteration. since it means we are mostly
undercommitted.


Sorry if it was ambiguous. I wanted to say we do not want to continue
yield to another vcpu..

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case

2012-10-31 Thread Avi Kivity
On 10/31/2012 03:15 PM, Raghavendra K T wrote:
 On 10/31/2012 06:11 PM, Raghavendra K T wrote:
 On 10/31/2012 06:08 PM, Avi Kivity wrote:
 On 10/29/2012 04:07 PM, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

 Also we do not update last boosted vcpu in failure cases.

   #endif
 +
   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   {
   struct kvm *kvm = me-kvm;
 @@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   continue;
   if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
   continue;
 -if (kvm_vcpu_yield_to(vcpu)) {
 +
 +yielded = kvm_vcpu_yield_to(vcpu);
 +if (yielded  0)
   kvm-last_boosted_vcpu = i;
 -yielded = 1;
 +if (yielded)
   break;
 -}
   }

 If yielded == -ESRCH, should we not try to yield to another vcpu?


   Yes. plan is to abort the iteration. since it means we are mostly
 undercommitted.
 
 Sorry if it was ambiguous. I wanted to say we do not want to continue
 yield to another vcpu..
 


Why not?  We found that this particular vcpu is running and therefore
likely not a lock holder.  That says nothing about other vcpus.  The
next in line might be runnable-but-not-running on another runqueue.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-10-31 Thread ronnie sahlberg
On Tue, Oct 30, 2012 at 10:48 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 10:09 PM, ronnie sahlberg
 ronniesahlb...@gmail.com wrote:
 About half a year there was an issue where recent kernels had added
 support to start using new scsi opcodes,  but the qemu functions that
 determine which transfer direction is used for this opcode had not
 yet been updated, so that the opcode was sent with the wrong transfer
 direction.

 That caused the guests memory to be overwritten and crash.

 I dont have (easy) access to the git tree right now, but it was a
 patch for the ATA_PASSTHROUGH command that fixed that.

 This patch?

 http://patchwork.ozlabs.org/patch/174946/

 Stefan

This is the one I was thinking about :
381b634c275ca1a2806e97392527bbfc01bcb333

But that also crashed when using local /dev/sg* devices.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/20] KVM/MIPS32: MIPS arch specific APIs for KVM

2012-10-31 Thread Sanjay Lal
- Implements the arch specific APIs for KVM, some are stubs for MIPS
- kvm_mips_handle_exit(): Main 'C' distpatch routine for handling exceptions 
while in Guest mode.
- Also implements in-kernel timer interrupt support for the guest.

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_mips.c | 1016 ++
 1 file changed, 1016 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_mips.c

diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
new file mode 100644
index 000..eb84568
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips.c
@@ -0,0 +1,1016 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS: MIPS specific KVM APIs
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include linux/errno.h
+#include linux/err.h
+#include linux/module.h
+#include linux/vmalloc.h
+#include linux/fs.h
+#include linux/bootmem.h
+#include asm/page.h
+#include asm/cacheflush.h
+#include asm/mmu_context.h
+
+#include linux/kvm_host.h
+
+#include kvm_mips_int.h
+#include kvm_mips_stats.h
+#include kvm_mips_comm.h
+
+#ifndef VECTORSPACING
+#define VECTORSPACING 0x100 /* for EI/VI mode */
+#endif
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+{NULL}
+};
+
+static int 
+kvm_mips_reset_vcpu (struct kvm_vcpu *vcpu)
+{
+int i;
+for (i = 0; i  NR_CPUS; i++) {
+vcpu-arch.guest_kernel_asid[i] = 0;
+vcpu-arch.guest_user_asid[i] = 0;
+}
+return 0;
+}
+
+gfn_t
+unalias_gfn(struct kvm *kvm, gfn_t gfn)
+{
+return gfn;
+}
+
+/* XXXKYMA: We are simulatoring a processor that has the WII bit set in 
Config7, so we
+ * are runnable if interrupts are pending
+ */
+int
+kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
+{
+return !!(vcpu-arch.pending_exceptions);
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+   return 1;
+}
+
+int 
+kvm_arch_hardware_enable(void *garbage)
+{
+return 0;
+}
+
+void
+kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int
+kvm_arch_hardware_setup(void)
+{
+return 0;
+}
+
+void
+kvm_arch_hardware_unsetup(void)
+{
+}
+
+void
+kvm_arch_check_processor_compat(void *rtn)
+{
+int *r = (int *) rtn;
+*r = 0;
+return;
+}
+
+static void
+kvm_mips_init_tlbs (void *arg)
+{
+ulong flags, wired;
+struct kvm *kvm = (struct kvm *) arg;
+
+ENTER_CRITICAL(flags);
+/* Add a wired entry to the TLB, it is used to map the commpage to the 
Guest kernel */
+wired = read_c0_wired();
+write_c0_wired(wired + 1);
+mtc0_tlbw_hazard();
+kvm-arch.commpage_tlb = wired;
+EXIT_CRITICAL(flags);
+
+kvm_debug([%d] commpage TLB: %d\n, smp_processor_id(), 
kvm-arch.commpage_tlb);
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+   int cpu;
+
+if (atomic_inc_return(kvm_mips_instance) == 1) {
+kvm_info(%s: 1st KVM instance, setup host TLB parameters\n, 
__func__);
+   for_each_online_cpu(cpu) {
+   smp_call_function_single(cpu, kvm_mips_init_tlbs, kvm, 1);
+}
+}
+
+kvm-arch.gfn_to_pfn = gfn_to_pfn;
+kvm-arch.release_pfn_clean = kvm_release_pfn_clean;
+kvm-arch.is_error_pfn = is_error_pfn;
+
+   return 0;
+}
+
+
+void
+kvm_mips_free_vcpus(struct kvm *kvm)
+{
+unsigned int i;
+struct kvm_vcpu *vcpu;
+
+/* Put the pages we reserved for the guest pmap */
+for (i = 0; i  kvm-arch.guest_pmap_npages; i++) {
+if (kvm-arch.guest_pmap[i] != KVM_INVALID_PAGE)
+kvm-arch.release_pfn_clean(kvm-arch.guest_pmap[i]);
+}
+
+if (kvm-arch.guest_pmap)
+kfree(kvm-arch.guest_pmap);
+
+kvm_for_each_vcpu(i, vcpu, kvm) {
+kvm_arch_vcpu_free(vcpu);
+}
+
+mutex_lock(kvm-lock);
+
+for (i = 0; i  atomic_read(kvm-online_vcpus); i++)
+kvm-vcpus[i] = NULL;
+
+atomic_set(kvm-online_vcpus, 0);
+
+mutex_unlock(kvm-lock);
+}
+
+void
+kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+static void
+kvm_mips_uninit_tlbs (void *arg)
+{
+/* Restore wired count */
+write_c0_wired(0);
+mtc0_tlbw_hazard();
+/* Clear out all the TLBs */
+kvm_local_flush_tlb_all();
+}
+
+
+void
+kvm_arch_destroy_vm(struct kvm *kvm)
+{
+int cpu;
+kvm_mips_free_vcpus(kvm);
+
+/* If this is the last instance, restore wired count */
+if (atomic_dec_return(kvm_mips_instance) == 0) {
+kvm_info(%s: last KVM instance, restoring TLB parameters\n, 
__func__);
+   for_each_online_cpu(cpu) {
+smp_call_function_single(cpu, kvm_mips_uninit_tlbs, NULL, 1);
+}
+}
+}
+
+long
+kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
+{
+return -EINVAL;
+}
+
+int
+kvm_arch_set_memory_region(struct kvm *kvm,
+   struct 

[PATCH 05/20] KVM/MIPS32: KVM Guest kernel support.

2012-10-31 Thread Sanjay Lal
Both Guest kernel and Guest Userspace execute in UM. The memory map is as 
follows:
Guest User address space:   0x - 0x4000
Guest Kernel Unmapped:  0x4000 - 0x6000
Guest Kernel Mapped:0x6000 - 0x8000
- Guest Usermode virtual memory is limited to 1GB.

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/include/asm/mach-generic/spaces.h |  9 -
 arch/mips/include/asm/processor.h   |  5 +
 arch/mips/include/asm/uaccess.h | 15 ---
 arch/mips/kernel/binfmt_elfo32.c|  6 +-
 arch/mips/kernel/cevt-r4k.c |  4 
 arch/mips/kernel/traps.c|  7 ++-
 arch/mips/mti-malta/malta-time.c| 13 +
 7 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/mach-generic/spaces.h 
b/arch/mips/include/asm/mach-generic/spaces.h
index d7a9efd..ff64289 100644
--- a/arch/mips/include/asm/mach-generic/spaces.h
+++ b/arch/mips/include/asm/mach-generic/spaces.h
@@ -20,14 +20,21 @@
 #endif
 
 #ifdef CONFIG_32BIT
-
+#ifdef CONFIG_KVM_GUEST
+#define CAC_BASE   _AC(0x4000, UL)
+#else
 #define CAC_BASE   _AC(0x8000, UL)
+#endif
 #define IO_BASE_AC(0xa000, UL)
 #define UNCAC_BASE _AC(0xa000, UL)
 
 #ifndef MAP_BASE
+#ifdef CONFIG_KVM_GUEST
+#define MAP_BASE   _AC(0x6000, UL)
+#else
 #define MAP_BASE   _AC(0xc000, UL)
 #endif
+#endif
 
 /*
  * Memory above this physical address will be considered highmem.
diff --git a/arch/mips/include/asm/processor.h 
b/arch/mips/include/asm/processor.h
index 5e33fab..7df9f06 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -44,11 +44,16 @@ extern unsigned int vced_count, vcei_count;
 #define SPECIAL_PAGES_SIZE PAGE_SIZE
 
 #ifdef CONFIG_32BIT
+#ifdef CONFIG_KVM_GUEST
+/* User space process size is limited to 1GB in KVM Guest Mode */
+#define TASK_SIZE  0x3fff8000UL
+#else
 /*
  * User space process size: 2GB. This is hardcoded into a few places,
  * so don't change it unless you know what you are doing.
  */
 #define TASK_SIZE  0x7fff8000UL
+#endif
 
 #ifdef __KERNEL__
 #define STACK_TOP_MAX  TASK_SIZE
diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index 3b92efe..aba9751 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -23,7 +23,11 @@
  */
 #ifdef CONFIG_32BIT
 
-#define __UA_LIMIT 0x8000UL
+#ifdef CONFIG_KVM_GUEST
+#define __UA_LIMIT 0x4000UL
+#else
+#define __UA_LIMIT 0x8000UL
+#endif
 
 #define __UA_ADDR  .word
 #define __UA_LAla
@@ -55,8 +59,13 @@ extern u64 __ua_limit;
  * address in this range it's the process's problem, not ours :-)
  */
 
-#define KERNEL_DS  ((mm_segment_t) { 0UL })
-#define USER_DS((mm_segment_t) { __UA_LIMIT })
+#ifdef CONFIG_KVM_GUEST
+#define KERNEL_DS  ((mm_segment_t) { 0x8000UL })
+#define USER_DS((mm_segment_t) { 0xC000UL })
+#else
+#define KERNEL_DS  ((mm_segment_t) { 0UL })
+#define USER_DS((mm_segment_t) { __UA_LIMIT })
+#endif
 
 #define VERIFY_READ0
 #define VERIFY_WRITE   1
diff --git a/arch/mips/kernel/binfmt_elfo32.c b/arch/mips/kernel/binfmt_elfo32.c
index ff44823..54f3904 100644
--- a/arch/mips/kernel/binfmt_elfo32.c
+++ b/arch/mips/kernel/binfmt_elfo32.c
@@ -48,7 +48,11 @@ typedef elf_fpreg_t elf_fpregset_t[ELF_NFPREG];
__res;  \
 })
 
-#define TASK32_SIZE0x7fff8000UL
+#ifdef CONFIG_KVM_GUEST
+#define TASK32_SIZE0x3fff8000UL
+#else
+#define TASK32_SIZE0x7fff8000UL
+#endif
 #undef ELF_ET_DYN_BASE
 #define ELF_ET_DYN_BASE (TASK32_SIZE / 3 * 2)
 
diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
index 7532392..eebb05b 100644
--- a/arch/mips/kernel/cevt-r4k.c
+++ b/arch/mips/kernel/cevt-r4k.c
@@ -118,6 +118,10 @@ int c0_compare_int_usable(void)
unsigned int delta;
unsigned int cnt;
 
+#ifdef CONFIG_KVM_GUEST
+return 1;
+#endif
+
/*
 * IP7 already pending?  Try to clear it by acking the timer.
 */
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 9260986..1413aef 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1709,7 +1709,12 @@ void __init trap_init(void)
ebase = (unsigned long)
__alloc_bootmem(size, 1  fls(size), 0);
} else {
-   ebase = CKSEG0;
+#ifdef CONFIG_KVM_GUEST
+#define KVM_GUEST_KSEG0 0x4000
+ebase = KVM_GUEST_KSEG0;
+#else
+ebase = CKSEG0;
+#endif
if (cpu_has_mips_r2)
ebase += (read_c0_ebase()  0x3000);
}
diff --git a/arch/mips/mti-malta/malta-time.c b/arch/mips/mti-malta/malta-time.c
index 

[PATCH 08/20] KVM/MIPS32: MMU/TLB operations for the Guest.

2012-10-31 Thread Sanjay Lal
- Note that this file is statically linked with the rest of the host kernel 
(KSEG0). This is because kernel modules are
loaded into mapped space on MIPS and we want to make sure that we don't get any 
host kernel TLB faults while
manipulating TLBs.
- Virtual Guest TLBs are implemented as 64 entry array regardless of the number 
of host TLB entries.
- Shadow TLBs map Guest virtual addresses to Host physical addresses.

- TLB miss handling details:
Guest KSEG0 TLBMISS (0x4000 – 0x6000): Transparent to the Guest.
Guest KSEG2/3 (0x6000 – 0x8000)  Guest UM TLBMISS (0x 
– 0x4000)
Lookup in Guest/Virtual TLB
If an entry doesn’t match
deliver appropriate TLBMISS LD/ST exception to the guest
If entry does exist in the Guest TLB and is NOT Valid
Deliver TLB invalid exception to the guest
If entry does exist in the Guest TLB and is VALID
Inject the TLB entry into the Shadow TLB

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_tlb.c | 889 
 1 file changed, 889 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_tlb.c

diff --git a/arch/mips/kvm/kvm_tlb.c b/arch/mips/kvm/kvm_tlb.c
new file mode 100644
index 000..dea3868
--- /dev/null
+++ b/arch/mips/kvm/kvm_tlb.c
@@ -0,0 +1,889 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS TLB handling, this file is part of the Linux host kernel so that
+* TLB handlers run from KSEG0
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include linux/init.h
+#include linux/sched.h
+#include linux/smp.h
+#include linux/mm.h
+#include linux/delay.h
+#include linux/module.h
+#include linux/kvm_host.h
+
+#include asm/cpu.h
+#include asm/bootinfo.h
+#include asm/mmu_context.h
+#include asm/pgtable.h
+#include asm/cacheflush.h
+
+#undef CONFIG_MIPS_MT
+#include asm/r4kcache.h
+#define CONFIG_MIPS_MT
+
+#include kvm_mips_stats.h
+
+
+#define KVM_GUEST_PC_TLB0
+#define KVM_GUEST_SP_TLB1
+
+#define PRIx64 llx
+
+#define UNIQUE_ENTRYHI(idx) (CKSEG0 + ((idx)  (PAGE_SHIFT + 1)))
+
+atomic_t kvm_mips_instance;
+EXPORT_SYMBOL(kvm_mips_instance);
+
+/*
+ * Structure defining an tlb entry data set.
+ */
+
+void
+kvm_mips_dump_host_tlbs(void)
+{
+struct kvm_mips_tlb tlb;
+int i;
+ulong flags;
+unsigned long old_entryhi;
+unsigned long old_pagemask;
+
+ENTER_CRITICAL(flags);
+
+old_entryhi = read_c0_entryhi();
+old_pagemask = read_c0_pagemask();
+
+printk(HOST TLBs:\n);
+printk(ASID: %#lx\n, read_c0_entryhi()  ASID_MASK);
+
+for (i = 0; i  current_cpu_data.tlbsize; i++) {
+write_c0_index(i);
+mtc0_tlbw_hazard();
+
+tlb_read();
+tlbw_use_hazard();
+
+tlb.tlb_hi = read_c0_entryhi();
+tlb.tlb_lo0 = read_c0_entrylo0();
+tlb.tlb_lo1 = read_c0_entrylo1();
+tlb.tlb_mask = read_c0_pagemask();
+ 
+printk(TLB%c%3d Hi 0x%08lx ,
+   (tlb.tlb_lo0 | tlb.tlb_lo1)  MIPS3_PG_V ? ' ' : '*',
+   i, tlb.tlb_hi);
+printk(Lo0=0x%09 PRIx64  %c%c attr %lx ,
+   (uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo0),
+   (tlb.tlb_lo0  MIPS3_PG_D) ? 'D' : ' ',
+   (tlb.tlb_lo0  MIPS3_PG_G) ? 'G' : ' ', (tlb.tlb_lo0  3)  7);
+printk(Lo1=0x%09 PRIx64  %c%c attr %lx sz=%lx\n,
+   (uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo1),
+   (tlb.tlb_lo1  MIPS3_PG_D) ? 'D' : ' ',
+   (tlb.tlb_lo1  MIPS3_PG_G) ? 'G' : ' ',
+   (tlb.tlb_lo1  3)  7, tlb.tlb_mask);
+}
+write_c0_entryhi(old_entryhi);
+write_c0_pagemask(old_pagemask);
+mtc0_tlbw_hazard();
+EXIT_CRITICAL(flags);
+}
+
+void
+kvm_mips_dump_guest_tlbs(struct kvm_vcpu *vcpu)
+{
+int i;
+struct kvm_mips_tlb tlb;
+struct mips_coproc *cop0 __unused = vcpu-arch.cop0;
+
+printk(Guest TLBs:\n);
+printk(Guest EntryHi: %#lx\n, kvm_read_c0_guest_entryhi(cop0));
+
+for (i = 0; i  KVM_MIPS_GUEST_TLB_SIZE; i++) {
+tlb = vcpu-arch.guest_tlb[i];
+printk(TLB%c%3d Hi 0x%08lx ,
+   (tlb.tlb_lo0 | tlb.tlb_lo1)  MIPS3_PG_V ? ' ' : '*',
+   i, tlb.tlb_hi);
+printk(Lo0=0x%09 PRIx64  %c%c attr %lx ,
+   (uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo0),
+   (tlb.tlb_lo0  MIPS3_PG_D) ? 'D' : ' ',
+   (tlb.tlb_lo0  MIPS3_PG_G) ? 'G' : ' ', (tlb.tlb_lo0  3)  7);
+printk(Lo1=0x%09 PRIx64  %c%c attr %lx sz=%lx\n,
+   (uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo1),
+   (tlb.tlb_lo1  MIPS3_PG_D) ? 'D' : ' ',
+   (tlb.tlb_lo1  MIPS3_PG_G) ? 'G' : ' ',
+   (tlb.tlb_lo1  3)  7, 

[PATCH 01/20] KVM/MIPS32: Infrastructure/build files.

2012-10-31 Thread Sanjay Lal
- Add the KVM option to MIPS build files.
- Add default config files for KVM host/guest kernels.
- Change the link address for the Malta KVM Guest kernel to UM (0x4010).
- Add KVM Kconfig file with KVM/MIPS specific options

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/Kbuild |  4 
 arch/mips/Kconfig| 20 ++
 arch/mips/kvm/Kconfig| 49 
 arch/mips/mti-malta/Platform |  6 +-
 4 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100644 arch/mips/kvm/Kconfig

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index 7dd65cf..d2cfe45 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,3 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+
+ifdef CONFIG_KVM
+obj-y += kvm/
+endif
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index dba9390..8ac1aa1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1284,6 +1284,8 @@ config CPU_MIPS32_R2
select CPU_HAS_PREFETCH
select CPU_SUPPORTS_32BIT_KERNEL
select CPU_SUPPORTS_HIGHMEM
+select HAVE_KVM
+
help
  Choose this option to build a kernel for release 2 or later of the
  MIPS32 architecture.  Most modern embedded systems with a 32-bit
@@ -1789,6 +1791,21 @@ config 64BIT
 
 endchoice
 
+config KVM_GUEST
+   bool KVM Guest Kernel
+   depends on HAVE_KVM 
+   help
+Select this option if building a guest kernel for KVM (Trap  Emulate) mode
+
+config KVM_HOST_FREQ
+int KVM Host Processor Frequency (MHz)
+depends on HAVE_KVM 
+default 500
+help
+  Select this option if building a guest kernel for KVM to skip
+  RTC emulation when determining guest CPU Frequency.  Instead, the guest
+  processor frequency is automatically derived from the host frequency.
+
 choice
prompt Kernel page size
default PAGE_SIZE_4KB
@@ -2069,6 +2086,7 @@ config SB1_PASS_2_1_WORKAROUNDS
depends on CPU_SB1  CPU_SB1_PASS_2
default y
 
+
 config 64BIT_PHYS_ADDR
bool
 
@@ -2579,3 +2597,5 @@ source security/Kconfig
 source crypto/Kconfig
 
 source lib/Kconfig
+
+source arch/mips/kvm/Kconfig
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
new file mode 100644
index 000..53390a3
--- /dev/null
+++ b/arch/mips/kvm/Kconfig
@@ -0,0 +1,49 @@
+#
+# KVM configuration
+#
+source virt/kvm/Kconfig
+
+menuconfig VIRTUALIZATION
+   bool Virtualization
+   depends on HAVE_KVM
+   ---help---
+ Say Y here to get to see options for using your Linux host to run
+ other operating systems inside virtual machines (guests).
+ This option alone does not add any kernel code.
+
+ If you say N, all options in this submenu will be skipped and
+ disabled.
+
+if VIRTUALIZATION
+
+config KVM
+   tristate Kernel-based Virtual Machine (KVM) support
+   depends on HAVE_KVM
+   select PREEMPT_NOTIFIERS
+   select ANON_INODES
+select KVM_MMIO
+
+config KVM_MIPS_DYN_TRANS
+   bool KVM/MIPS: Dynamic binary translation to reduce traps
+depends on KVM
+   ---help---
+  When running in Trap  Emulate mode use dynamic translation for 
privileged
+  instructions to reduce the number of traps
+
+ If unsure, say Y.
+
+config KVM_EXIT_STATS
+   bool Maintain VM Exit Statistics
+depends on KVM
+   ---help---
+ If unsure, say N.
+
+config KVM_MIPS_DEBUG_COP0_COUNTERS
+   bool Maintain counters for COP0 accesses
+depends on KVM
+   ---help---
+ If unsure, say N.
+
+source drivers/vhost/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/mips/mti-malta/Platform b/arch/mips/mti-malta/Platform
index 5b548b5..2cc72c9 100644
--- a/arch/mips/mti-malta/Platform
+++ b/arch/mips/mti-malta/Platform
@@ -3,5 +3,9 @@
 #
 platform-$(CONFIG_MIPS_MALTA)  += mti-malta/
 cflags-$(CONFIG_MIPS_MALTA)+= -I$(srctree)/arch/mips/include/asm/mach-malta
-load-$(CONFIG_MIPS_MALTA)  += 0x8010
+ifdef CONFIG_KVM_GUEST
+load-$(CONFIG_MIPS_MALTA)  += 0x4010
+else
+load-$(CONFIG_MIPS_MALTA)  += 0x8010
+endif
 all-$(CONFIG_MIPS_MALTA)   := $(COMPRESSION_FNAME).bin
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/20] KVM/MIPS32: Entry point for trampolining to the guest and trap handlers

2012-10-31 Thread Sanjay Lal
- __kvm_mips_vcpu_run: main entry point to enter guest, we save kernel context, 
load
  up guest context from and ERET to guest context.
- mips32_exception: L1 exception handler(s), save k0/k1 and jump to main 
handlers.
- mips32_GuestException: Generic exception handlers for exceptions/interrupts 
while in
  guest context.  Save guest context, restore some kernel context and jump to
  main 'C' handler: kvm_mips_handle_exit()

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_locore.S | 646 +
 1 file changed, 646 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_locore.S

diff --git a/arch/mips/kvm/kvm_locore.S b/arch/mips/kvm/kvm_locore.S
new file mode 100644
index 000..319a4ed
--- /dev/null
+++ b/arch/mips/kvm/kvm_locore.S
@@ -0,0 +1,646 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* Main entry point for the guest, exception handling.
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include asm/asm.h
+#include asm/asmmacro.h
+#include asm/regdef.h
+#include asm/mipsregs.h
+#include asm/stackframe.h
+#include asm/asm-offsets.h
+
+#define _C_LABEL(x) x
+#define MIPSX(name) mips32_ ## name
+#define CALLFRAME_SIZ   32
+
+/*
+ * VECTOR
+ *  exception vector entrypoint
+ */
+#define VECTOR(x, regmask)  \
+.ent_C_LABEL(x),0;  \
+EXPORT(x);
+
+#define VECTOR_END(x)  \
+EXPORT(x);
+
+/* Overload, Danger Will Robinson!! */
+#define PT_HOST_ASIDPT_BVADDR
+#define PT_HOST_USERLOCAL   PT_EPC
+
+#define CP0_DDATA_LO$28,3
+#define CP0_EBASE   $15,1
+
+#define CP0_INTCTL  $12,1
+#define CP0_SRSCTL  $12,2
+#define CP0_SRSMAP  $12,3
+#define CP0_HWRENA  $7,0
+
+/* Resume Flags */
+#define RESUME_FLAG_HOST(11)  /* Resume host? */
+
+#define RESUME_GUEST0
+#define RESUME_HOST RESUME_FLAG_HOST
+
+/*
+ * __kvm_mips_vcpu_run: entry point to the guest
+ * a0: run
+ * a1: vcpu
+ */
+
+FEXPORT(__kvm_mips_vcpu_run)
+.setpush
+.setnoreorder
+.setnoat
+
+/* k0/k1 not being used in host kernel context */
+   addiu   k1,sp, -PT_SIZE
+LONG_S $0, PT_R0(k1)
+LONG_S $1, PT_R1(k1)
+LONG_S $2, PT_R2(k1)
+LONG_S $3, PT_R3(k1)
+
+LONG_S $4, PT_R4(k1)
+LONG_S $5, PT_R5(k1)
+LONG_S $6, PT_R6(k1)
+LONG_S $7, PT_R7(k1)
+
+LONG_S $8,  PT_R8(k1)
+LONG_S $9,  PT_R9(k1)
+LONG_S $10, PT_R10(k1)
+LONG_S $11, PT_R11(k1)
+LONG_S $12, PT_R12(k1)
+LONG_S $13, PT_R13(k1)
+LONG_S $14, PT_R14(k1)
+LONG_S $15, PT_R15(k1)
+LONG_S $16, PT_R16(k1)
+LONG_S $17, PT_R17(k1)
+
+LONG_S $18, PT_R18(k1)
+LONG_S $19, PT_R19(k1)
+LONG_S $20, PT_R20(k1)
+LONG_S $21, PT_R21(k1)
+LONG_S $22, PT_R22(k1)
+LONG_S $23, PT_R23(k1)
+LONG_S $24, PT_R24(k1)
+LONG_S $25, PT_R25(k1)
+
+   /* XXXKYMA k0/k1 not saved, not being used if we got here through an 
ioctl() */
+
+LONG_S $28, PT_R28(k1)
+LONG_S $29, PT_R29(k1)
+LONG_S $30, PT_R30(k1)
+LONG_S $31, PT_R31(k1)
+
+/* Save hi/lo */
+   mflov0
+   LONG_S  v0, PT_LO(k1)
+   mfhiv1
+   LONG_S  v1, PT_HI(k1)
+
+   /* Save host status */
+   mfc0v0, CP0_STATUS
+   LONG_S  v0, PT_STATUS(k1)
+
+   /* Save host ASID, shove it into the BVADDR location */
+   mfc0v1,CP0_ENTRYHI
+   andiv1, 0xff
+   LONG_S  v1, PT_HOST_ASID(k1)
+
+/* Save DDATA_LO, will be used to store pointer to vcpu */
+mfc0v1, CP0_DDATA_LO
+LONG_S  v1, PT_HOST_USERLOCAL(k1)
+
+/* DDATA_LO has pointer to vcpu */
+mtc0a1,CP0_DDATA_LO
+
+/* Offset into vcpu-arch */
+   addiu   k1, a1, VCPU_HOST_ARCH
+
+/* Save the host stack to VCPU, used for exception processing when we exit 
from the Guest */
+LONG_S  sp, VCPU_HOST_STACK(k1)
+
+/* Save the kernel gp as well */
+LONG_S  gp, VCPU_HOST_GP(k1)
+
+   /* Setup status register for running the guest in UM, interrupts are 
disabled */
+   li  k0,(ST0_EXL | KSU_USER| ST0_BEV)
+   mtc0k0,CP0_STATUS
+ehb
+
+/* load up the new EBASE */
+LONG_L  k0, VCPU_GUEST_EBASE(k1)
+mtc0k0,CP0_EBASE
+
+/* Now that the new EBASE has been loaded, 

[PATCH 15/20] MIPS: If KVM is enabled then use the KVM specific routine to flush the TLBs on a ASID wrap

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/include/asm/mmu_context.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/mips/include/asm/mmu_context.h 
b/arch/mips/include/asm/mmu_context.h
index 9b02cfb..9c7024c 100644
--- a/arch/mips/include/asm/mmu_context.h
+++ b/arch/mips/include/asm/mmu_context.h
@@ -112,15 +112,21 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, 
struct task_struct *tsk)
 static inline void
 get_new_mmu_context(struct mm_struct *mm, unsigned long cpu)
 {
+extern void kvm_local_flush_tlb_all(void);
unsigned long asid = asid_cache(cpu);
 
if (! ((asid += ASID_INC)  ASID_MASK) ) {
if (cpu_has_vtag_icache)
flush_icache_all();
+#ifdef CONFIG_VIRTUALIZATION
+kvm_local_flush_tlb_all();  /* start new asid cycle */
+#else
local_flush_tlb_all();  /* start new asid cycle */
+#endif
if (!asid)  /* fix version if needed */
asid = ASID_FIRST_VERSION;
}
+
cpu_context(cpu, mm) = asid_cache(cpu) = asid;
 }
 
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/20] KVM/MIPS32: Guest interrupt delivery.

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_mips_int.c | 251 +++
 arch/mips/kvm/kvm_mips_int.h |  49 +
 2 files changed, 300 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_mips_int.c
 create mode 100644 arch/mips/kvm/kvm_mips_int.h

diff --git a/arch/mips/kvm/kvm_mips_int.c b/arch/mips/kvm/kvm_mips_int.c
new file mode 100644
index 000..2771446
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips_int.c
@@ -0,0 +1,251 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS: Interrupt delivery
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include linux/errno.h
+#include linux/err.h
+#include linux/module.h
+#include linux/vmalloc.h
+#include linux/fs.h
+#include linux/bootmem.h
+#include asm/page.h
+#include asm/cacheflush.h
+
+#include linux/kvm_host.h
+
+#include kvm_mips_int.h
+
+
+void
+kvm_mips_queue_irq(struct kvm_vcpu *vcpu, uint32_t priority)
+{
+set_bit(priority, vcpu-arch.pending_exceptions);
+}
+
+void
+kvm_mips_dequeue_irq(struct kvm_vcpu *vcpu, uint32_t priority)
+{
+clear_bit(priority, vcpu-arch.pending_exceptions);
+}
+
+void
+kvm_mips_queue_timer_int_cb(struct kvm_vcpu *vcpu)
+{
+/* Cause bits to reflect the pending timer interrupt, 
+ * the EXC code will be set when we are actually
+ * delivering the interrupt:
+ */
+kvm_set_c0_guest_cause(vcpu-arch.cop0, (C_IRQ5 | C_TI));
+
+/* Queue up an INT exception for the core */
+kvm_mips_queue_irq(vcpu, MIPS_EXC_INT_TIMER);
+
+}
+
+void
+kvm_mips_dequeue_timer_int_cb(struct kvm_vcpu *vcpu)
+{
+kvm_clear_c0_guest_cause(vcpu-arch.cop0, (C_IRQ5 | C_TI));
+kvm_mips_dequeue_irq(vcpu, MIPS_EXC_INT_TIMER);
+}
+
+void
+kvm_mips_queue_io_int_cb(struct kvm_vcpu *vcpu, struct kvm_mips_interrupt *irq)
+{
+int intr = (int) irq-irq;
+
+/* Cause bits to reflect the pending IO interrupt, 
+ * the EXC code will be set when we are actually
+ * delivering the interrupt:
+ */
+switch (intr) {
+case 2:
+kvm_set_c0_guest_cause(vcpu-arch.cop0, (C_IRQ0));
+/* Queue up an INT exception for the core */
+kvm_mips_queue_irq(vcpu, MIPS_EXC_INT_IO);
+break;
+
+case 3:
+kvm_set_c0_guest_cause(vcpu-arch.cop0, (C_IRQ1));
+kvm_mips_queue_irq(vcpu, MIPS_EXC_INT_IPI_1);
+break;
+
+case 4:
+kvm_set_c0_guest_cause(vcpu-arch.cop0, (C_IRQ2));
+kvm_mips_queue_irq(vcpu, MIPS_EXC_INT_IPI_2);
+break;
+
+default:
+break;
+}
+
+}
+
+void
+kvm_mips_dequeue_io_int_cb(struct kvm_vcpu *vcpu, struct kvm_mips_interrupt 
*irq)
+{
+int intr = (int) irq-irq;
+switch (intr) {
+case -2:
+kvm_clear_c0_guest_cause(vcpu-arch.cop0, (C_IRQ0));
+kvm_mips_dequeue_irq(vcpu, MIPS_EXC_INT_IO);
+break;
+
+case -3:
+kvm_clear_c0_guest_cause(vcpu-arch.cop0, (C_IRQ1));
+kvm_mips_dequeue_irq(vcpu, MIPS_EXC_INT_IPI_1);
+break;
+
+case -4:
+kvm_clear_c0_guest_cause(vcpu-arch.cop0, (C_IRQ2));
+kvm_mips_dequeue_irq(vcpu, MIPS_EXC_INT_IPI_2);
+break;
+
+default:
+break;
+}
+
+}
+
+/* Deliver the interrupt of the corresponding priority, if possible. */
+int
+kvm_mips_irq_deliver_cb(struct kvm_vcpu *vcpu, unsigned int priority,
+uint32_t cause)
+{
+int allowed = 0;
+uint32_t exccode;
+
+struct kvm_vcpu_arch *arch = vcpu-arch;
+struct mips_coproc *cop0 __unused = vcpu-arch.cop0;
+
+switch (priority) {
+case MIPS_EXC_INT_TIMER:
+if ((kvm_read_c0_guest_status(cop0)  ST0_IE)
+ (!(kvm_read_c0_guest_status(cop0)  (ST0_EXL | ST0_ERL)))
+ (kvm_read_c0_guest_status(cop0)  IE_IRQ5)) {
+allowed = 1;
+exccode = T_INT;
+}
+break;
+
+case MIPS_EXC_INT_IO:
+if ((kvm_read_c0_guest_status(cop0)  ST0_IE)
+ (!(kvm_read_c0_guest_status(cop0)  (ST0_EXL | ST0_ERL)))
+ (kvm_read_c0_guest_status(cop0)  IE_IRQ0)) {
+allowed = 1;
+exccode = T_INT;
+}
+break;
+
+case MIPS_EXC_INT_IPI_1:
+if ((kvm_read_c0_guest_status(cop0)  ST0_IE)
+ (!(kvm_read_c0_guest_status(cop0)  (ST0_EXL | ST0_ERL)))
+ (kvm_read_c0_guest_status(cop0)  IE_IRQ1)) {
+allowed = 1;
+exccode = T_INT;
+}
+break;
+
+case MIPS_EXC_INT_IPI_2:
+if ((kvm_read_c0_guest_status(cop0)  ST0_IE)
+ (!(kvm_read_c0_guest_status(cop0)  (ST0_EXL | ST0_ERL)))
+ (kvm_read_c0_guest_status(cop0)  IE_IRQ2)) {
+allowed = 

[PATCH 13/20] MIPS: Export routines needed by the KVM module.

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/mm/c-r4k.c   | 6 --
 arch/mips/mm/cache.c   | 1 +
 arch/mips/mm/tlb-r4k.c | 2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 2b61462..1923063 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -136,7 +136,8 @@ static void __cpuinit 
r4k_blast_dcache_page_indexed_setup(void)
r4k_blast_dcache_page_indexed = blast_dcache64_page_indexed;
 }
 
-static void (* r4k_blast_dcache)(void);
+void (* r4k_blast_dcache)(void);
+EXPORT_SYMBOL(r4k_blast_dcache);
 
 static void __cpuinit r4k_blast_dcache_setup(void)
 {
@@ -264,7 +265,8 @@ static void __cpuinit 
r4k_blast_icache_page_indexed_setup(void)
r4k_blast_icache_page_indexed = blast_icache64_page_indexed;
 }
 
-static void (* r4k_blast_icache)(void);
+void (* r4k_blast_icache)(void);
+EXPORT_SYMBOL(r4k_blast_icache);
 
 static void __cpuinit r4k_blast_icache_setup(void)
 {
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 07cec44..5aeb3eb 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -48,6 +48,7 @@ void (*flush_icache_all)(void);
 
 EXPORT_SYMBOL_GPL(local_flush_data_cache_page);
 EXPORT_SYMBOL(flush_data_cache_page);
+EXPORT_SYMBOL(flush_icache_all);
 
 #ifdef CONFIG_DMA_NONCOHERENT
 
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index 4b9b935..fd30887 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -13,6 +13,7 @@
 #include linux/smp.h
 #include linux/mm.h
 #include linux/hugetlb.h
+#include linux/module.h
 
 #include asm/cpu.h
 #include asm/bootinfo.h
@@ -94,6 +95,7 @@ void local_flush_tlb_all(void)
FLUSH_ITLB;
EXIT_CRITICAL(flags);
 }
+EXPORT_SYMBOL(local_flush_tlb_all);
 
 /* All entries common to a mm share an asid.  To effectively flush
these entries, we just bump the asid. */
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/20] MIPS: Export symbols used by KVM/MIPS module

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kernel/smp.c | 1 +
 mm/bootmem.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 9005bf9..60ea489 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -83,6 +83,7 @@ static inline void set_cpu_sibling_map(int cpu)
 }
 
 struct plat_smp_ops *mp_ops;
+EXPORT_SYMBOL(mp_ops);
 
 __cpuinit void register_smp_ops(struct plat_smp_ops *ops)
 {
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 434be4a..f9c0221 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -32,6 +32,7 @@ EXPORT_SYMBOL(contig_page_data);
 
 unsigned long max_low_pfn;
 unsigned long min_low_pfn;
+EXPORT_SYMBOL(min_low_pfn);
 unsigned long max_pfn;
 
 bootmem_data_t bootmem_node_data[MAX_NUMNODES] __initdata;
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/20] KVM/MIPS32: Do not call vcpu_load when injecting interrupts.

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e59bb63..1cc985a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1882,7 +1882,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
if (vcpu-kvm-mm != current-mm)
return -EIO;
 
-#if defined(CONFIG_S390) || defined(CONFIG_PPC)
+#if defined(CONFIG_S390) || defined(CONFIG_PPC) || defined(CONFIG_MIPS)
/*
 * Special cases: vcpu ioctls that are asynchronous to vcpu execution,
 * so vcpu_load() would break it.
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/20] MIPS: Pull in MIPS fix: fix endless loop when processing signals for kernel tasks.

2012-10-31 Thread Sanjay Lal
This bug is discussed in: 
http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/00719.html

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kernel/entry.S | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/entry.S b/arch/mips/kernel/entry.S
index a6c1332..9b00362 100644
--- a/arch/mips/kernel/entry.S
+++ b/arch/mips/kernel/entry.S
@@ -36,6 +36,11 @@ FEXPORT(ret_from_exception)
 FEXPORT(ret_from_irq)
LONG_S  s0, TI_REGS($28)
 FEXPORT(__ret_from_irq)
+/*
+ * We can be coming here from a syscall done in the kernel space,
+ * e.g. a failed kernel_execve().
+ */
+resume_userspace_check:
LONG_L  t0, PT_STATUS(sp)   # returning to kernel mode?
andit0, t0, KU_USER
beqzt0, resume_kernel
@@ -162,7 +167,7 @@ work_notifysig: # deal with 
pending signals and
movea0, sp
li  a1, 0
jal do_notify_resume# a2 already loaded
-   j   resume_userspace
+   j   resume_userspace_check
 
 FEXPORT(syscall_exit_partial)
local_irq_disable   # make sure need_resched doesn't
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/20] MIPS: ASM offsets for VCPU arch specific fields.

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kernel/asm-offsets.c | 67 ++
 1 file changed, 67 insertions(+)

diff --git a/arch/mips/kernel/asm-offsets.c b/arch/mips/kernel/asm-offsets.c
index 833eeab..d65a2fd 100644
--- a/arch/mips/kernel/asm-offsets.c
+++ b/arch/mips/kernel/asm-offsets.c
@@ -18,6 +18,8 @@
 #include asm/ptrace.h
 #include asm/processor.h
 
+#include linux/kvm_host.h
+
 void output_ptreg_defines(void)
 {
COMMENT(MIPS pt_regs offsets.);
@@ -330,3 +332,68 @@ void output_pbe_defines(void)
BLANK();
 }
 #endif
+
+void output_kvm_defines(void)
+{
+   COMMENT( KVM/MIPS Specfic offsets. );
+   DEFINE(VCPU_ARCH_SIZE, sizeof(struct kvm_vcpu_arch));
+OFFSET(VCPU_RUN, kvm_vcpu, run);
+   OFFSET(VCPU_HOST_ARCH, kvm_vcpu, arch);
+
+   OFFSET(VCPU_HOST_EBASE, kvm_vcpu_arch, host_ebase);
+   OFFSET(VCPU_GUEST_EBASE, kvm_vcpu_arch, guest_ebase);
+
+   OFFSET(VCPU_HOST_STACK, kvm_vcpu_arch, host_stack);
+   OFFSET(VCPU_HOST_GP, kvm_vcpu_arch, host_gp);
+
+   OFFSET(VCPU_HOST_CP0_BADVADDR, kvm_vcpu_arch, host_cp0_badvaddr);
+   OFFSET(VCPU_HOST_CP0_CAUSE, kvm_vcpu_arch, host_cp0_cause);
+   OFFSET(VCPU_HOST_EPC, kvm_vcpu_arch, host_cp0_epc);
+   OFFSET(VCPU_HOST_ENTRYHI, kvm_vcpu_arch, host_cp0_entryhi);
+
+   OFFSET(VCPU_GUEST_INST, kvm_vcpu_arch, guest_inst);
+
+   OFFSET(VCPU_R0, kvm_vcpu_arch, gprs[0]);
+   OFFSET(VCPU_R1, kvm_vcpu_arch, gprs[1]);
+   OFFSET(VCPU_R2, kvm_vcpu_arch, gprs[2]);
+   OFFSET(VCPU_R3, kvm_vcpu_arch, gprs[3]);
+   OFFSET(VCPU_R4, kvm_vcpu_arch, gprs[4]);
+   OFFSET(VCPU_R5, kvm_vcpu_arch, gprs[5]);
+   OFFSET(VCPU_R6, kvm_vcpu_arch, gprs[6]);
+   OFFSET(VCPU_R7, kvm_vcpu_arch, gprs[7]);
+   OFFSET(VCPU_R8, kvm_vcpu_arch, gprs[8]);
+   OFFSET(VCPU_R9, kvm_vcpu_arch, gprs[9]);
+   OFFSET(VCPU_R10, kvm_vcpu_arch, gprs[10]);
+   OFFSET(VCPU_R11, kvm_vcpu_arch, gprs[11]);
+   OFFSET(VCPU_R12, kvm_vcpu_arch, gprs[12]);
+   OFFSET(VCPU_R13, kvm_vcpu_arch, gprs[13]);
+   OFFSET(VCPU_R14, kvm_vcpu_arch, gprs[14]);
+   OFFSET(VCPU_R15, kvm_vcpu_arch, gprs[15]);
+   OFFSET(VCPU_R16, kvm_vcpu_arch, gprs[16]);
+   OFFSET(VCPU_R17, kvm_vcpu_arch, gprs[17]);
+   OFFSET(VCPU_R18, kvm_vcpu_arch, gprs[18]);
+   OFFSET(VCPU_R19, kvm_vcpu_arch, gprs[19]);
+   OFFSET(VCPU_R20, kvm_vcpu_arch, gprs[20]);
+   OFFSET(VCPU_R21, kvm_vcpu_arch, gprs[21]);
+   OFFSET(VCPU_R22, kvm_vcpu_arch, gprs[22]);
+   OFFSET(VCPU_R23, kvm_vcpu_arch, gprs[23]);
+   OFFSET(VCPU_R24, kvm_vcpu_arch, gprs[24]);
+   OFFSET(VCPU_R25, kvm_vcpu_arch, gprs[25]);
+   OFFSET(VCPU_R26, kvm_vcpu_arch, gprs[26]);
+   OFFSET(VCPU_R27, kvm_vcpu_arch, gprs[27]);
+   OFFSET(VCPU_R28, kvm_vcpu_arch, gprs[28]);
+   OFFSET(VCPU_R29, kvm_vcpu_arch, gprs[29]);
+   OFFSET(VCPU_R30, kvm_vcpu_arch, gprs[30]);
+   OFFSET(VCPU_R31, kvm_vcpu_arch, gprs[31]);
+   OFFSET(VCPU_LO, kvm_vcpu_arch, lo);
+   OFFSET(VCPU_HI, kvm_vcpu_arch, hi);
+   OFFSET(VCPU_PC, kvm_vcpu_arch, pc);
+OFFSET(VCPU_COP0, kvm_vcpu_arch, cop0);
+OFFSET(VCPU_ASID_MAP, kvm_vcpu_arch, asid_map);
+OFFSET(VCPU_GUEST_KERNEL_ASID, kvm_vcpu_arch, guest_kernel_asid);
+OFFSET(VCPU_GUEST_USER_ASID, kvm_vcpu_arch, guest_user_asid);
+
+OFFSET(COP0_TLB_HI, mips_coproc, reg[MIPS_CP0_TLB_HI][0]);
+OFFSET(COP0_STATUS, mips_coproc, reg[MIPS_CP0_STATUS][0]);
+   BLANK();
+}
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/20] KVM/MIPS32: Dynamic binary translation of select privileged instructions.

2012-10-31 Thread Sanjay Lal
Currently, the following instructions are translated:
- CACHE (indexed)
- CACHE (va based): translated to a synci, overkill on D-CACHE operations, but 
still much faster than a trap.
- mfc0/mtc0: the virtual COP0 registers for the guest are implemented as 2-D 
array
  [COP#][SEL] and this is mapped into the guest kernel address space @ VA 0x0.
  mfc0/mtc0 operations are transformed to load/stores.

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_mips_comm.h |  24 +++
 arch/mips/kvm/kvm_mips_commpage.c |  38 ++
 arch/mips/kvm/kvm_mips_dyntrans.c | 142 ++
 3 files changed, 204 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_mips_comm.h
 create mode 100644 arch/mips/kvm/kvm_mips_commpage.c
 create mode 100644 arch/mips/kvm/kvm_mips_dyntrans.c

diff --git a/arch/mips/kvm/kvm_mips_comm.h b/arch/mips/kvm/kvm_mips_comm.h
new file mode 100644
index 000..02073db
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips_comm.h
@@ -0,0 +1,24 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS commpage: mapped into guest kernel @ VA: 0x0 to support dynamic 
translation
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#ifndef __KVM_MIPS_COMMPAGE_H__
+#define __KVM_MIPS_COMMPAGE_H__
+
+struct kvm_mips_commpage {
+struct mips_coproc cop0;/* COP0 state is mapped into Guest kernel via 
commpage */
+};
+
+#define KVM_MIPS_COMM_EIDI_OFFSET   0x0
+
+extern void kvm_mips_commpage_init (struct kvm_vcpu *vcpu);
+
+#endif /* __KVM_MIPS_COMMPAGE_H__ */
+
diff --git a/arch/mips/kvm/kvm_mips_commpage.c 
b/arch/mips/kvm/kvm_mips_commpage.c
new file mode 100644
index 000..5a4b21f
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips_commpage.c
@@ -0,0 +1,38 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* commpage, currently used for Virtual COP0 registers. Mapped into the guest 
kernel
+* aspace @ 0x0.
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include linux/errno.h
+#include linux/err.h
+#include linux/module.h
+#include linux/vmalloc.h
+#include linux/fs.h
+#include linux/bootmem.h
+#include asm/page.h
+#include asm/cacheflush.h
+#include asm/mmu_context.h
+
+#include linux/kvm_host.h
+
+#include kvm_mips_comm.h
+
+void
+kvm_mips_commpage_init (struct kvm_vcpu *vcpu)
+{
+struct kvm_mips_commpage *page = vcpu-arch.kseg0_commpage;
+memset (page, 0, sizeof(struct kvm_mips_commpage));
+
+/* Specific init values for fields */
+vcpu-arch.cop0 = page-cop0;
+memset(vcpu-arch.cop0, 0, sizeof(struct mips_coproc));
+
+return;
+}
diff --git a/arch/mips/kvm/kvm_mips_dyntrans.c 
b/arch/mips/kvm/kvm_mips_dyntrans.c
new file mode 100644
index 000..2cbbdde
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips_dyntrans.c
@@ -0,0 +1,142 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS: Dynamic translation for privileged instructions, reduces traps.
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include linux/errno.h
+#include linux/err.h
+#include linux/kvm_host.h
+#include linux/module.h
+#include linux/vmalloc.h
+#include linux/fs.h
+#include linux/bootmem.h
+
+#include kvm_mips_comm.h
+
+#define SYNCI_TEMPLATE  0x041f
+#define SYNCI_BASE(x)   (((x)  21)  0x1f)
+#define SYNCI_OFFSET((x)  0x)
+
+#define LW_TEMPLATE 0x8c00
+#define CLEAR_TEMPLATE  0x0020
+#define SW_TEMPLATE 0xac00
+
+int
+kvm_mips_trans_cache_index (uint32_t inst, uint32_t __user *opc, struct 
kvm_vcpu *vcpu)
+{
+int result = 0;
+ulong kseg0_opc;
+uint32_t synci_inst = 0x0;
+
+/* Replace the CACHE instruction, with a NOP */
+kseg0_opc = CKSEG0ADDR(kvm_mips_translate_guest_kseg0_to_hpa (vcpu, 
(ulong) opc));
+memcpy((void *) kseg0_opc, (void *)synci_inst, sizeof(uint32_t));
+mips32_SyncICache(kseg0_opc, 32);
+
+return (result);
+}
+
+/*
+ *  Address based CACHE instructions are transformed into synci(s). A little 
heavy
+ * for just D-cache invalidates, but avoids an expensive trap
+ */
+int
+kvm_mips_trans_cache_va (uint32_t inst, uint32_t __user *opc, struct kvm_vcpu 
*vcpu)
+{
+int result = 0;
+ulong kseg0_opc;
+uint32_t synci_inst = SYNCI_TEMPLATE, base, offset;
+
+base = (inst  21)  0x1f;
+offset = inst  0x;
+synci_inst |= (base  21);
+synci_inst |= offset;
+
+kseg0_opc = CKSEG0ADDR(kvm_mips_translate_guest_kseg0_to_hpa (vcpu, 
(ulong) opc));
+

[PATCH 10/20] KVM/MIPS32: Keep track of VM exits and a historgram of COP0 accesses.

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_mips_stats.c | 93 ++
 arch/mips/kvm/kvm_mips_stats.h | 47 +
 2 files changed, 140 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_mips_stats.c
 create mode 100644 arch/mips/kvm/kvm_mips_stats.h

diff --git a/arch/mips/kvm/kvm_mips_stats.c b/arch/mips/kvm/kvm_mips_stats.c
new file mode 100644
index 000..232e91b
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips_stats.c
@@ -0,0 +1,93 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS: VM Exit stats, COP0 access histogram
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include linux/kvm_host.h
+
+char *kvm_mips_exit_types_str[MAX_KVM_MIPS_EXIT_TYPES] = {
+WAIT,
+CACHE,
+Signal,
+Interrupt,
+COP0/1 Unusable,
+TLB Mod,
+TLB Miss (LD),
+TLB Miss (ST),
+Address Err (ST),
+Address Error (LD),
+System Call,
+Reserved Inst,
+Break Inst,
+D-Cache Flushes,
+};
+
+char *kvm_cop0_str[N_MIPS_COPROC_REGS] = {
+Index,
+Random,
+EntryLo0,
+EntryLo1,
+Context,
+PG Mask,
+Wired,
+HWREna,
+BadVAddr,
+Count,
+EntryHI,
+Compare,
+Status,
+Cause,
+EXC PC,
+PRID,
+Config,
+LLAddr,
+Watch Lo,
+Watch Hi,
+X Context,
+Reserved,
+Impl Dep,
+Debug,
+DEPC,
+PerfCnt,
+ErrCtl,
+CacheErr,
+TagLo,
+TagHi,
+ErrorEPC,
+DESAVE
+};
+
+int
+kvm_mips_dump_stats(struct kvm_vcpu *vcpu)
+{
+int i, j __unused;
+ulong total_exits = 0;
+
+/* 1st run, total exits */
+for (i = 0; i  MAX_KVM_MIPS_EXIT_TYPES; i++) {
+total_exits += vcpu-arch.exit_reason_stats[i];
+}
+
+printk(KVM Exit Stats (%lu total exits):\n, total_exits);
+for (i = 0; i  MAX_KVM_MIPS_EXIT_TYPES; i++) {
+printk(\t%s: %lu\n, kvm_mips_exit_types_str[i], 
vcpu-arch.exit_reason_stats[i]);
+}
+
+#ifdef CONFIG_KVM_MIPS_DEBUG_COP0_COUNTERS
+printk(\nKVM COP0 Access Profile:\n);
+for (i = 0; i  N_MIPS_COPROC_REGS; i++) {
+for (j = 0; j  N_MIPS_COPROC_SEL; j++) {
+if (vcpu-arch.cop0-stat[i][j])
+printk(%s[%d]: %lu\n, kvm_cop0_str[i], j, 
vcpu-arch.cop0-stat[i][j]);
+}
+}
+#endif
+
+return 0;
+}
diff --git a/arch/mips/kvm/kvm_mips_stats.h b/arch/mips/kvm/kvm_mips_stats.h
new file mode 100644
index 000..df6872a
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips_stats.h
@@ -0,0 +1,47 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS: Instrumentation, currently logs VM exit stats and COP0 accesses
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#ifndef __KVM_MIPS_STATS_H__
+#define __KVM_MIPS_STATS_H__
+
+#include linux/kvm_host.h
+#include asm/kvm_host.h
+
+#ifdef CONFIG_KVM_EXIT_STATS
+void kvm_mips_create_vcpu_debugfs(struct kvm_vcpu *vcpu, unsigned int id);
+void kvm_mips_remove_vcpu_debugfs(struct kvm_vcpu *vcpu);
+
+static inline void kvm_mips_set_exit_type(struct kvm_vcpu *vcpu, int type)
+{
+}
+
+/* account the exit in kvm_stats */
+static inline void kvm_mips_account_exit_stat(struct kvm_vcpu *vcpu, enum 
kvm_mips_exit_types type)
+{
+vcpu-arch.exit_reason_stats[type]++;
+}
+
+/* wrapper to set exit time and account for it in kvm_stats */
+static inline void kvm_mips_account_exit(struct kvm_vcpu *vcpu, enum 
kvm_mips_exit_types type)
+{
+   kvm_mips_account_exit_stat(vcpu, type);
+}
+
+#else
+static inline void kvm_mips_create_vcpu_debugfs(struct kvm_vcpu *vcpu,
+   unsigned int id) {}
+static inline void kvm_mips_remove_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
+static inline void kvm_mips_set_exit_type(struct kvm_vcpu *vcpu, int type) {}
+
+static inline void kvm_mips_account_exit(struct kvm_vcpu *vcpu, enum 
kvm_mips_exit_types type) {}
+#endif /* CONFIG_KVM_EXIT_STATS*/
+
+#endif /* __KVM_MIPS_STATS_H__ */
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/20] KVM/MIPS32: Routines to handle specific traps/exceptions while executing the guest.

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_cb.c|  16 ++
 arch/mips/kvm/kvm_trap_emul.c | 446 ++
 2 files changed, 462 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_cb.c
 create mode 100644 arch/mips/kvm/kvm_trap_emul.c

diff --git a/arch/mips/kvm/kvm_cb.c b/arch/mips/kvm/kvm_cb.c
new file mode 100644
index 000..768198e
--- /dev/null
+++ b/arch/mips/kvm/kvm_cb.c
@@ -0,0 +1,16 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* PUT YOUR TITLE AND/OR INFORMATION FOR THE FILE HERE
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Yann Le Du l...@kymasys.com
+*/
+
+#include linux/export.h
+#include linux/kvm_host.h
+
+struct kvm_mips_callbacks *kvm_mips_callbacks = NULL;
+EXPORT_SYMBOL(kvm_mips_callbacks);
diff --git a/arch/mips/kvm/kvm_trap_emul.c b/arch/mips/kvm/kvm_trap_emul.c
new file mode 100644
index 000..68983bc
--- /dev/null
+++ b/arch/mips/kvm/kvm_trap_emul.c
@@ -0,0 +1,446 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS: Deliver/Emulate exceptions to the guest kernel 
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+
+#include linux/errno.h
+#include linux/err.h
+#include linux/module.h
+#include linux/vmalloc.h
+
+#include linux/kvm_host.h
+
+#include kvm_mips_opcode.h
+#include kvm_mips_stats.h
+#include kvm_mips_int.h
+
+static gpa_t
+kvm_trap_emul_gva_to_gpa_cb(gva_t gva)
+{
+gpa_t gpa;
+uint32_t kseg = KSEGX(gva);
+
+if ((kseg == CKSEG0) || (kseg == CKSEG1))
+gpa = CPHYSADDR(gva);
+else {
+printk(%s: cannot find GPA for GVA: %#lx\n, __func__, gva);
+kvm_mips_dump_host_tlbs();
+gpa = KVM_INVALID_ADDR;
+}
+
+#ifdef DEBUG
+kvm_debug(%s: gva %#lx, gpa: %#llx\n, __func__, gva, gpa);
+#endif
+
+return gpa;
+}
+
+
+static int
+kvm_trap_emul_handle_cop_unusable(struct kvm_vcpu *vcpu)
+{
+struct kvm_run *run = vcpu-run;
+uint32_t __user *opc = (uint32_t __user *) vcpu-arch.pc;
+ulong cause = vcpu-arch.host_cp0_cause;
+enum emulation_result er = EMULATE_DONE;
+int ret = RESUME_GUEST;
+
+if (((cause  CAUSEF_CE)  CAUSEB_CE) == 1) {
+er = kvm_mips_emulate_fpu_exc(cause, opc, run, vcpu);
+}
+else
+er = kvm_mips_emulate_inst(cause, opc, run, vcpu);
+
+switch (er) {
+case EMULATE_DONE:
+ret = RESUME_GUEST;
+break;
+
+case EMULATE_FAIL:
+run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ret = RESUME_HOST;
+break;
+
+case EMULATE_WAIT:
+run-exit_reason = KVM_EXIT_INTR;
+ret = RESUME_HOST;
+break;
+
+default:
+BUG();
+}
+return ret;
+}
+
+static int
+kvm_trap_emul_handle_tlb_mod(struct kvm_vcpu *vcpu)
+{
+struct kvm_run *run = vcpu-run;
+uint32_t __user *opc = (uint32_t __user *) vcpu-arch.pc;
+ulong cause = vcpu-arch.host_cp0_cause;
+ulong badvaddr = vcpu-arch.host_cp0_badvaddr;
+enum emulation_result er = EMULATE_DONE;
+int ret = RESUME_GUEST;
+
+if (KVM_GUEST_KSEGX(badvaddr)  KVM_GUEST_KSEG0
+|| KVM_GUEST_KSEGX(badvaddr) == KVM_GUEST_KSEG23) {
+#ifdef DEBUG
+kvm_debug(USER/KSEG23 ADDR TLB MOD fault: cause %#lx, PC: %p, 
BadVaddr: %#lx\n,
+ cause, opc, badvaddr);
+#endif
+er = kvm_mips_handle_tlbmod(cause, opc, run, vcpu);
+
+if (er == EMULATE_DONE)
+ret = RESUME_GUEST;
+else {
+run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ret = RESUME_HOST;
+}
+}
+else if (KVM_GUEST_KSEGX(badvaddr) == KVM_GUEST_KSEG0) {
+/* XXXKYMA: The guest kernel does not expect to get this fault when we 
are not
+ * using HIGHMEM. Need to address this in a HIGHMEM kernel
+ */
+printk (TLB MOD fault not handled, cause %#lx, PC: %p, BadVaddr: 
%#lx\n,
+ cause, opc, badvaddr);
+kvm_mips_dump_host_tlbs();
+kvm_arch_vcpu_dump_regs(vcpu);
+run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ret = RESUME_HOST;
+}
+else {
+printk
+(Illegal TLB Mod fault address , cause %#lx, PC: %p, BadVaddr: 
%#lx\n,
+ cause, opc, badvaddr);
+kvm_mips_dump_host_tlbs();
+kvm_arch_vcpu_dump_regs(vcpu);
+run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ret = RESUME_HOST;
+}
+return ret;
+}
+
+static int
+kvm_trap_emul_handle_tlb_st_miss(struct kvm_vcpu *vcpu)
+{
+struct kvm_run *run = vcpu-run;
+uint32_t __user *opc = (uint32_t __user *) vcpu-arch.pc;
+ulong cause = vcpu-arch.host_cp0_cause;
+ulong badvaddr = 

[PATCH 00/20] KVM for MIPS32 Processors

2012-10-31 Thread Sanjay Lal
The following patchset implements KVM support for MIPS32R2 processors,
using Trap  Emulate, with basic runtime binary translation to improve
performance.  The goal has been to keep the Guest kernel changes to a
minimum.

The patch is against Linux 3.7-rc2.  

There is a companion patchset for QEMU that adds KVM support for the 
MIPS target.

KVM/MIPS should support MIPS32-R2 processors and beyond.
It has been tested on the following platforms:
  - Malta Board with FPGA based 34K (Little Endian).
  - Sigma Designs TangoX board with a 24K based 8654 SoC (Little Endian).
  - Malta Board with 74K @ 1GHz (Little Endian).
  - OVPSim MIPS simulator from Imperas emulating a Malta board with 
24Kc and 1074Kc cores (Little Endian).

Both Guest kernel and Guest Userspace execute in UM. The Guest address space is
as folows:
 Guest User address space:   0x - 0x4000
 Guest Kernel Unmapped:  0x4000 - 0x6000
 Guest Kernel Mapped:0x6000 - 0x8000

As a result, Guest Usermode virtual memory is limited to 1GB.

Relase Notes

(1) 16K Page Size:
Both Host Kernel and Guest Kernel should have the same page size, 
currently at least 16K.  Note that due to cache aliasing issues, 
4K page sizes are NOT supported.

(2) No HugeTLB/Large Page Support:
Both the host kernel and Guest kernel should have the page size 
set to at least 16K.
This will be implemented in a future release.

(3) SMP Guests to not work
Linux-3.7-rc2 based SMP guest hangs due to the following code sequence 
in the generated TLB handlers:
 LL/TLBP/SC
Since the TLBP instruction causes a trap the reservation gets cleared
when we ERET back to the guest. This causes the guest to hang in an 
infinite loop.
As a workaround, make sure that CONFIG_SMP is disabled for Guest kernels.
This will be fixed in a future release.

(4) FPU support:
Currently KVM/MIPS emulates a 24K CPU without a FPU.
This will be fixed in a future release

--
Sanjay Lal (20):
  KVM/MIPS32: Infrastructure/build files.
  KVM/MIPS32: Arch specific KVM data structures.
  KVM/MIPS32: Entry point for trampolining to the guest and trap
handlers.
  KVM/MIPS32: MIPS arch specific APIs for KVM
  KVM/MIPS32: KVM Guest kernel support.
  KVM/MIPS32: Privileged instruction/target branch emulation.
  KVM/MIPS32: Dynamic binary translation of select privileged
instructions.
  KVM/MIPS32: MMU/TLB operations for the Guest.
  KVM/MIPS32: Release notes and KVM module Makefile
  KVM/MIPS32: Keep track of VM exits and a historgram of COP0 accesses.
  KVM/MIPS32: Guest interrupt delivery.
  KVM/MIPS32: Routines to handle specific traps/exceptions while
executing the guest.
  MIPS: Export routines needed by the KVM module.
  MIPS: Use the UM bit instead of the CU0 enable bit in the status
register to figure out the stack for saving regs.
  MIPS: If KVM is enabled then use the KVM specific routine to flush
the TLBs on a ASID wrap
  MIPS: ASM offsets for VCPU arch specific fields.
  MIPS: Pull in MIPS fix: fix endless loop when processing signals for
kernel tasks.
  MIPS: Export symbols used by KVM/MIPS module
  KVM/MIPS32: Do not call vcpu_load when injecting interrupts.
  KVM/MIPS32: Malta config files for KVM host and guest kernels.

 arch/mips/Kbuild|4 +
 arch/mips/Kconfig   |   20 +
 arch/mips/configs/malta_kvm_defconfig   | 2244 +++
 arch/mips/configs/malta_kvm_guest_defconfig | 2237 ++
 arch/mips/include/asm/kvm.h |   58 +
 arch/mips/include/asm/kvm_host.h|  672 
 arch/mips/include/asm/mach-generic/spaces.h |9 +-
 arch/mips/include/asm/mmu_context.h |6 +
 arch/mips/include/asm/processor.h   |5 +
 arch/mips/include/asm/stackframe.h  |6 +-
 arch/mips/include/asm/uaccess.h |   15 +-
 arch/mips/kernel/asm-offsets.c  |   67 +
 arch/mips/kernel/binfmt_elfo32.c|6 +-
 arch/mips/kernel/cevt-r4k.c |4 +
 arch/mips/kernel/entry.S|7 +-
 arch/mips/kernel/smp.c  |1 +
 arch/mips/kernel/traps.c|7 +-
 arch/mips/kvm/00README.txt  |   31 +
 arch/mips/kvm/Kconfig   |   49 +
 arch/mips/kvm/Makefile  |   13 +
 arch/mips/kvm/kvm_cb.c  |   16 +
 arch/mips/kvm/kvm_locore.S  |  646 
 arch/mips/kvm/kvm_mips.c| 1016 
 arch/mips/kvm/kvm_mips_comm.h   |   24 +
 arch/mips/kvm/kvm_mips_commpage.c   |   38 +
 arch/mips/kvm/kvm_mips_dyntrans.c   |  142 ++
 arch/mips/kvm/kvm_mips_emul.c   | 1786 +
 arch/mips/kvm/kvm_mips_int.c|  251 +++
 arch/mips/kvm/kvm_mips_int.h|   49 +
 

[PATCH 06/20] KVM/MIPS32: Privileged instruction/target branch emulation

2012-10-31 Thread Sanjay Lal
- The Guest kernel is run in UM and privileged instructions cause a trap.
- If the instruction causing the trap is in a branch delay slot, the branch 
needs to be emulated to figure
out the PC @ which the guest will resume execution.

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/kvm_mips_emul.c   | 1786 +++
 arch/mips/kvm/kvm_mips_opcode.h |   27 +
 2 files changed, 1813 insertions(+)
 create mode 100644 arch/mips/kvm/kvm_mips_emul.c
 create mode 100644 arch/mips/kvm/kvm_mips_opcode.h

diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c
new file mode 100644
index 000..6c03b4d
--- /dev/null
+++ b/arch/mips/kvm/kvm_mips_emul.c
@@ -0,0 +1,1786 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* KVM/MIPS: Instruction/Exception emulation
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#include linux/errno.h
+#include linux/err.h
+#include linux/kvm_host.h
+#include linux/module.h
+#include linux/vmalloc.h
+#include linux/fs.h
+#include linux/bootmem.h
+#include linux/random.h
+#include asm/page.h
+#include asm/cacheflush.h
+#include asm/cpu-info.h
+#include asm/mmu_context.h
+#include asm/tlbflush.h
+#include asm/inst.h
+
+#undef CONFIG_MIPS_MT
+#include asm/r4kcache.h
+#define CONFIG_MIPS_MT
+
+#include kvm_mips_opcode.h
+#include kvm_mips_int.h
+#include kvm_mips_comm.h
+#include kvm_mips_stats.h
+
+static int debug __unused;
+
+/*
+ * Compute the return address and do emulate branch simulation, if required.
+ * This function should be called only in branch delay slot active.
+ */
+u_long
+kvm_compute_return_epc(struct kvm_vcpu * vcpu, u_long instpc)
+{
+   unsigned int dspcontrol;
+   union mips_instruction insn;
+struct kvm_vcpu_arch *arch = vcpu-arch;
+long epc = instpc;
+long nextpc = KVM_INVALID_INST;
+
+   if (epc  3)
+   goto unaligned;
+
+   /*
+* Read the instruction
+*/
+insn.word = kvm_get_inst((uint32_t *) epc, vcpu);
+
+if (insn.word == KVM_INVALID_INST)
+return (KVM_INVALID_INST);
+
+   switch (insn.i_format.opcode) {
+   /*
+* jr and jalr are in r_format format.
+*/
+   case spec_op:
+   switch (insn.r_format.func) {
+   case jalr_op:
+   arch-gprs[insn.r_format.rd] = epc + 8;
+   /* Fall through */
+   case jr_op:
+   nextpc = arch-gprs[insn.r_format.rs];
+   break;
+   }
+   break;
+
+   /*
+* This group contains:
+* bltz_op, bgez_op, bltzl_op, bgezl_op,
+* bltzal_op, bgezal_op, bltzall_op, bgezall_op.
+*/
+   case bcond_op:
+   switch (insn.i_format.rt) {
+   case bltz_op:
+   case bltzl_op:
+   if ((long)arch-gprs[insn.i_format.rs]  0)
+   epc = epc + 4 + (insn.i_format.simmediate  2);
+   else
+   epc += 8;
+   nextpc = epc;
+   break;
+
+   case bgez_op:
+   case bgezl_op:
+   if ((long)arch-gprs[insn.i_format.rs] = 0)
+   epc = epc + 4 + (insn.i_format.simmediate  2);
+   else
+   epc += 8;
+   nextpc = epc;
+   break;
+
+   case bltzal_op:
+   case bltzall_op:
+   arch-gprs[31] = epc + 8;
+   if ((long)arch-gprs[insn.i_format.rs]  0)
+   epc = epc + 4 + (insn.i_format.simmediate  2);
+   else
+   epc += 8;
+   nextpc = epc;
+   break;
+
+   case bgezal_op:
+   case bgezall_op:
+   arch-gprs[31] = epc + 8;
+   if ((long)arch-gprs[insn.i_format.rs] = 0)
+   epc = epc + 4 + (insn.i_format.simmediate  2);
+   else
+   epc += 8;
+   nextpc = epc;
+   break;
+   case bposge32_op:
+   if (!cpu_has_dsp)
+   goto sigill;
+
+   dspcontrol = rddsp(0x01);
+
+   if (dspcontrol = 32) {
+   epc = epc + 4 + (insn.i_format.simmediate  2);
+   } else
+   epc += 8;
+   nextpc = epc;
+   break;
+   }
+   break;
+
+   /*
+

[PATCH 14/20] MIPS: Use the UM bit instead of the CU0 enable bit in the status register to figure out the stack for saving regs.

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/include/asm/stackframe.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/stackframe.h 
b/arch/mips/include/asm/stackframe.h
index cb41af5..59c9245 100644
--- a/arch/mips/include/asm/stackframe.h
+++ b/arch/mips/include/asm/stackframe.h
@@ -30,7 +30,7 @@
 #define STATMASK 0x1f
 #endif
 
-#ifdef CONFIG_MIPS_MT_SMTC
+#if defined(CONFIG_MIPS_MT_SMTC) || defined (CONFIG_MIPS_HW_FIBERS)
 #include asm/mipsmtregs.h
 #endif /* CONFIG_MIPS_MT_SMTC */
 
@@ -162,9 +162,9 @@
.setnoat
.setreorder
mfc0k0, CP0_STATUS
-   sll k0, 3   /* extract cu0 bit */
+   andik0,k0,0x10  /* check user mode bit*/
.setnoreorder
-   bltzk0, 8f
+ beq k0, $0, 8f
 move   k1, sp
.setreorder
/* Called from user mode, new stack. */
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/20] KVM/MIPS32: Arch specific KVM data structures.

2012-10-31 Thread Sanjay Lal

Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/include/asm/kvm.h  |  58 
 arch/mips/include/asm/kvm_host.h | 672 +++
 2 files changed, 730 insertions(+)
 create mode 100644 arch/mips/include/asm/kvm.h
 create mode 100644 arch/mips/include/asm/kvm_host.h

diff --git a/arch/mips/include/asm/kvm.h b/arch/mips/include/asm/kvm.h
new file mode 100644
index 000..39bb715
--- /dev/null
+++ b/arch/mips/include/asm/kvm.h
@@ -0,0 +1,58 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+
+#ifndef __LINUX_KVM_MIPS_H
+#define __LINUX_KVM_MIPS_H
+
+#include linux/types.h
+
+#define __KVM_MIPS
+
+#define N_MIPS_COPROC_REGS  32
+#define N_MIPS_COPROC_SEL  8
+
+/* for KVM_GET_REGS and KVM_SET_REGS */
+struct kvm_regs {
+__u32 gprs[32];
+__u32 hi;
+__u32 lo;
+__u32 pc;
+
+ulong cp0reg[N_MIPS_COPROC_REGS][N_MIPS_COPROC_SEL];
+};
+
+/* for KVM_GET_SREGS and KVM_SET_SREGS */
+struct kvm_sregs {
+};
+
+/* for KVM_GET_FPU and KVM_SET_FPU */
+struct kvm_fpu {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+/* for KVM_SET_GUEST_DEBUG */
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_mips_interrupt {
+/* in */
+__u32 cpu;
+__u32 irq;
+};
+
+/* definition of registers in kvm_run */
+struct kvm_sync_regs {
+};
+
+#endif /* __LINUX_KVM_MIPS_H */
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
new file mode 100644
index 000..0352106
--- /dev/null
+++ b/arch/mips/include/asm/kvm_host.h
@@ -0,0 +1,672 @@
+/*
+* This file is subject to the terms and conditions of the GNU General Public
+* License.  See the file COPYING in the main directory of this archive
+* for more details.
+*
+* PUT YOUR TITLE AND/OR INFORMATION FOR THE FILE HERE
+*
+* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
+* Authors: Sanjay Lal sanj...@kymasys.com
+*/
+
+#ifndef __MIPS_KVM_HOST_H__
+#define __MIPS_KVM_HOST_H__
+
+#include linux/mutex.h
+#include linux/hrtimer.h
+#include linux/interrupt.h
+#include linux/types.h
+#include linux/kvm_types.h
+#include linux/threads.h
+#include linux/spinlock.h
+
+#define KVM_MAX_VCPUS 8
+#define KVM_MEMORY_SLOTS 32
+/* memory slots that does not exposed to userspace */
+#define KVM_PRIVATE_MEM_SLOTS 4
+
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+/* Don't support huge pages */
+#define KVM_HPAGE_GFN_SHIFT(x)  0
+
+/* We don't currently support large pages. */
+#define KVM_NR_PAGE_SIZES  1
+#define KVM_PAGES_PER_HPAGE(x)  1
+
+#define ENTER_CRITICAL(flags)   local_irq_save(flags)
+#define EXIT_CRITICAL(flags)local_irq_restore(flags)
+
+
+#define KVM_GUEST_KERNEL_ASID  
((vcpu-arch.guest_kernel_asid[smp_processor_id()])  ASID_MASK)
+
+#define KVM_GUEST_USER_ASID
((vcpu-arch.guest_user_asid[smp_processor_id()])  ASID_MASK)
+
+
+#define KVM_GUEST_WIRED_TLBS(current_cpu_data.tlbsize)
+#define KVM_GUEST_COMMPAGE_TLB  (vcpu-kvm-arch.commpage_tlb)
+#define KVM_GUEST_TLBS  KVM_GUEST_WIRED_TLBS
+
+/* Special address that contains the comm page, used for reducing # of traps */
+#define KVM_GUEST_COMMPAGE_ADDR 0x0
+
+#define KVM_GUEST_KERNEL_MODE(vcpu) 
((kvm_read_c0_guest_status(vcpu-arch.cop0)  (ST0_EXL | ST0_ERL)) || \
+
((kvm_read_c0_guest_status(vcpu-arch.cop0)  KSU_USER) == 0))
+
+#define KVM_GUEST_KUSEG 0xUL
+#define KVM_GUEST_KSEG0 0x4000UL
+#define KVM_GUEST_KSEG230x6000UL
+#define KVM_GUEST_KSEGX(a)  ((_ACAST32_(a))  0x6000)
+#define KVM_GUEST_CPHYSADDR(a)  ((_ACAST32_(a))  0x1fff)
+
+#define KVM_GUEST_CKSEG0ADDR(a)(KVM_GUEST_CPHYSADDR(a) | 
KVM_GUEST_KSEG0)
+#define KVM_GUEST_CKSEG1ADDR(a)(KVM_GUEST_CPHYSADDR(a) | 
KVM_GUEST_KSEG1)
+#define KVM_GUEST_CKSEG23ADDR(a)   (KVM_GUEST_CPHYSADDR(a) | 
KVM_GUEST_KSEG23)
+
+/*
+ * Map an address to a certain kernel segment
+ */
+#define KVM_GUEST_KSEG0ADDR(a) (KVM_GUEST_CPHYSADDR(a) | KSEG0)
+#define KVM_GUEST_KSEG23ADDR(a)(KVM_GUEST_CPHYSADDR(a) | 
KVM_GUEST_KSEG23)
+
+#define KVM_INVALID_PAGE0xdeadbeef
+#define KVM_INVALID_INST0xdeadbeef
+#define KVM_INVALID_ADDR0xdeadbeef
+
+#define KVM_MALTA_GUEST_RTC_ADDR0xb870UL
+
+#ifndef __unused
+#define __unused __attribute__((unused))
+#endif
+
+#define GUEST_TICKS_PER_JIFFY (4000/HZ)
+#define MS_TO_NS(x) (x * 1E6L)
+
+#define CAUSEB_DC   27
+#define CAUSEF_DC   (_ULCAST_(1)27)
+
+struct kvm;
+struct kvm_run;
+struct kvm_vcpu;
+struct kvm_interrupt;
+
+extern atomic_t kvm_mips_instance;
+
+struct kvm_vm_stat
+{
+u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat
+{
+   

[PATCH 09/20] KVM/MIPS32: Release notes and KVM module Makefile

2012-10-31 Thread Sanjay Lal
Signed-off-by: Sanjay Lal sanj...@kymasys.com
---
 arch/mips/kvm/00README.txt | 31 +++
 arch/mips/kvm/Makefile | 13 +
 2 files changed, 44 insertions(+)
 create mode 100644 arch/mips/kvm/00README.txt
 create mode 100644 arch/mips/kvm/Makefile

diff --git a/arch/mips/kvm/00README.txt b/arch/mips/kvm/00README.txt
new file mode 100644
index 000..daaf280
--- /dev/null
+++ b/arch/mips/kvm/00README.txt
@@ -0,0 +1,31 @@
+KVM/MIPS Trap  Emulate Release Notes
+=
+
+(1) KVM/MIPS should support MIPS32R2 and beyond. It has been tested on the 
following platforms:
+Malta Board with FPGA based 34K
+Sigma Designs TangoX board with a 24K based 8654 SoC.
+Malta Board with 74K @ 1GHz
+
+(2) Both Guest kernel and Guest Userspace execute in UM.  
+Guest User address space:   0x - 0x4000
+Guest Kernel Unmapped:  0x4000 - 0x6000
+Guest Kernel Mapped:0x6000 - 0x8000
+
+Guest Usermode virtual memory is limited to 1GB.
+
+(2) 16K Page Sizes: Both Host Kernel and Guest Kernel should have the same 
page size, currently at least 16K.
+Note that due to cache aliasing issues, 4K page sizes are NOT supported.
+
+(3) No HugeTLB Support
+Both the host kernel and Guest kernel should have the page size set to 16K.
+This will be implemented in a future release.
+
+(4) KVM/MIPS does not have support for SMP Guests
+Linux-3.7-rc2 based SMP guest hangs due to the following code sequence in 
the generated TLB handlers:
+   LL/TLBP/SC.  Since the TLBP instruction causes a trap the reservation 
gets cleared
+   when we ERET back to the guest. This causes the guest to hang in an 
infinite loop.
+   This will be fixed in a future release.
+
+(5) Use Host FPU
+Currently KVM/MIPS emulates a 24K CPU without a FPU.
+This will be fixed in a future release
diff --git a/arch/mips/kvm/Makefile b/arch/mips/kvm/Makefile
new file mode 100644
index 000..5b609bf
--- /dev/null
+++ b/arch/mips/kvm/Makefile
@@ -0,0 +1,13 @@
+# Makefile for KVM support for MIPS
+#
+
+common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+EXTRA_CFLAGS += -Ivirt/kvm -Iarch/mips/kvm
+
+kvm-objs := $(common-objs) kvm_mips.o kvm_mips_emul.o kvm_locore.o 
kvm_mips_int.o \
+kvm_mips_stats.o kvm_mips_commpage.o kvm_mips_dyntrans.o
+
+kvm-objs  += kvm_trap_emul.o
+obj-$(CONFIG_KVM) += kvm.o
+obj-y += kvm_tlb.o kvm_cb.o 
-- 
1.7.11.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v3 0/5] s390: Host support for channel I/O.

2012-10-31 Thread Cornelia Huck
Hi,

here's the latest incarnation of my host patches to support channel
I/O on s390.

Most patches have only seen minor fixes, but patch 5 is completely
different since the kvm - user space interface has been reworked.

We now handle only interrupt-related operations in kvm. This
includes two channel I/O instructions that can dequeue pending I/O
interrupts: tpi and tsch (not the part actually interacting with
the subchannel). This makes the interface less complex (only one
new exit for tsch handling) and avoids duplicating code from qemu.

Cornelia Huck (5):
  KVM: s390: Support for I/O interrupts.
  KVM: s390: Add support for machine checks.
  KVM: s390: In-kernel handling of I/O instructions.
  KVM: s390: Base infrastructure for enabling capabilities.
  KVM: s390: Add support for channel I/O instructions.

 Documentation/virtual/kvm/api.txt |  40 +-
 arch/s390/include/asm/kvm_host.h  |  11 ++
 arch/s390/kvm/intercept.c |  22 ++-
 arch/s390/kvm/interrupt.c | 264 +++-
 arch/s390/kvm/kvm-s390.c  |  38 ++
 arch/s390/kvm/kvm-s390.h  |   6 +
 arch/s390/kvm/priv.c  | 275 +++---
 arch/s390/kvm/trace-s390.h|  26 +++-
 include/linux/kvm.h   |  18 +++
 include/trace/events/kvm.h|   2 +-
 10 files changed, 673 insertions(+), 29 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: s390: Support for I/O interrupts.

2012-10-31 Thread Cornelia Huck
Add support for handling I/O interrupts (standard, subchannel-related
ones and rudimentary adapter interrupts).

The subchannel-identifying parameters are encoded into the interrupt
type.

I/O interrupts are floating, so they can't be injected on a specific
vcpu.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt |   4 ++
 arch/s390/include/asm/kvm_host.h  |   2 +
 arch/s390/kvm/interrupt.c | 115 --
 include/linux/kvm.h   |   6 ++
 4 files changed, 122 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 4258180..b660a6e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2068,6 +2068,10 @@ KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; 
external interrupt
 KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
 KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
 KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
+KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
+I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
+I/O interruption parameters in parm (subchannel) and parm64 (intparm,
+interruption subclass)
 
 Note that the vcpu ioctl is asynchronous to vcpu execution.
 
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b784154..e47f697 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -76,6 +76,7 @@ struct kvm_s390_sie_block {
__u64   epoch;  /* 0x0038 */
__u8reserved40[4];  /* 0x0040 */
 #define LCTL_CR0   0x8000
+#define LCTL_CR6   0x0200
__u16   lctl;   /* 0x0044 */
__s16   icpua;  /* 0x0046 */
__u32   ictl;   /* 0x0048 */
@@ -127,6 +128,7 @@ struct kvm_vcpu_stat {
u32 deliver_prefix_signal;
u32 deliver_restart_signal;
u32 deliver_program_int;
+   u32 deliver_io_int;
u32 exit_wait_state;
u32 instruction_stidp;
u32 instruction_spx;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index c30615e..070ba22 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -21,11 +21,26 @@
 #include gaccess.h
 #include trace-s390.h
 
+#define IOINT_SCHID_MASK 0x
+#define IOINT_SSID_MASK 0x0003
+#define IOINT_CSSID_MASK 0x03fc
+#define IOINT_AI_MASK 0x0400
+
+static int is_ioint(u64 type)
+{
+   return ((type  0xfffeu) != 0xfffeu);
+}
+
 static int psw_extint_disabled(struct kvm_vcpu *vcpu)
 {
return !(vcpu-arch.sie_block-gpsw.mask  PSW_MASK_EXT);
 }
 
+static int psw_ioint_disabled(struct kvm_vcpu *vcpu)
+{
+   return !(vcpu-arch.sie_block-gpsw.mask  PSW_MASK_IO);
+}
+
 static int psw_interrupts_disabled(struct kvm_vcpu *vcpu)
 {
if ((vcpu-arch.sie_block-gpsw.mask  PSW_MASK_PER) ||
@@ -68,7 +83,18 @@ static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
case KVM_S390_RESTART:
return 1;
default:
-   BUG();
+   if (is_ioint(inti-type)) {
+   if (psw_ioint_disabled(vcpu))
+   return 0;
+   if (vcpu-arch.sie_block-gcr[6] 
+   inti-io.io_int_word)
+   return 1;
+   return 0;
+   } else {
+   printk(KERN_WARNING illegal interrupt type %llx\n,
+  inti-type);
+   BUG();
+   }
}
return 0;
 }
@@ -117,6 +143,13 @@ static void __set_intercept_indicator(struct kvm_vcpu 
*vcpu,
__set_cpuflag(vcpu, CPUSTAT_STOP_INT);
break;
default:
+   if (is_ioint(inti-type)) {
+   if (psw_ioint_disabled(vcpu))
+   __set_cpuflag(vcpu, CPUSTAT_IO_INT);
+   else
+   vcpu-arch.sie_block-lctl |= LCTL_CR6;
+   break;
+   }
BUG();
}
 }
@@ -298,7 +331,49 @@ static void __do_deliver_interrupt(struct kvm_vcpu *vcpu,
break;
 
default:
-   BUG();
+   if (is_ioint(inti-type)) {
+   __u32 param0 = ((__u32)inti-io.subchannel_id  16) |
+   inti-io.subchannel_nr;
+   __u64 param1 = ((__u64)inti-io.io_int_parm  32) |
+   inti-io.io_int_word;
+   VCPU_EVENT(vcpu, 4,
+  interrupt: I/O %llx, inti-type);
+   vcpu-stat.deliver_io_int++;
+   

[PATCH 3/5] KVM: s390: In-kernel handling of I/O instructions.

2012-10-31 Thread Cornelia Huck
Explicitely catch all channel I/O related instructions intercepts
in the kernel and set condition code 3 for them.

This paws the way for properly handling these instructions later
on.

Note: This is not architecture compliant (the previous code wasn't
either) since setting cc 3 is not the correct thing to do for some
of these instructions. For Linux guests, however, it still has the
intended effect of stopping css probing.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/kvm/intercept.c | 19 +---
 arch/s390/kvm/kvm-s390.h  |  1 +
 arch/s390/kvm/priv.c  | 56 +--
 3 files changed, 56 insertions(+), 20 deletions(-)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index ec1177f..754dc9e 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -33,8 +33,6 @@ static int handle_lctlg(struct kvm_vcpu *vcpu)
int reg, rc;
 
vcpu-stat.instruction_lctlg++;
-   if ((vcpu-arch.sie_block-ipb  0xff) != 0x2f)
-   return -EOPNOTSUPP;
 
useraddr = disp2;
if (base2)
@@ -104,6 +102,21 @@ static int handle_lctl(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static intercept_handler_t eb_handlers[256] = {
+   [0x2f] = handle_lctlg,
+   [0x8a] = kvm_s390_handle_priv_eb,
+};
+
+static int handle_eb(struct kvm_vcpu *vcpu)
+{
+   intercept_handler_t handler;
+
+   handler = eb_handlers[vcpu-arch.sie_block-ipb  0xff];
+   if (handler)
+   return handler(vcpu);
+   return -EOPNOTSUPP;
+}
+
 static intercept_handler_t instruction_handlers[256] = {
[0x01] = kvm_s390_handle_01,
[0x82] = kvm_s390_handle_lpsw,
@@ -113,7 +126,7 @@ static intercept_handler_t instruction_handlers[256] = {
[0xb7] = handle_lctl,
[0xb9] = kvm_s390_handle_b9,
[0xe5] = kvm_s390_handle_e5,
-   [0xeb] = handle_lctlg,
+   [0xeb] = handle_eb,
 };
 
 static int handle_noop(struct kvm_vcpu *vcpu)
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index b1e1cb6..7f50229 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -83,6 +83,7 @@ int kvm_s390_handle_e5(struct kvm_vcpu *vcpu);
 int kvm_s390_handle_01(struct kvm_vcpu *vcpu);
 int kvm_s390_handle_b9(struct kvm_vcpu *vcpu);
 int kvm_s390_handle_lpsw(struct kvm_vcpu *vcpu);
+int kvm_s390_handle_priv_eb(struct kvm_vcpu *vcpu);
 
 /* implemented in sigp.c */
 int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu);
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index bf13ce9..2aba96b 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -135,20 +135,9 @@ static int handle_skey(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static int handle_stsch(struct kvm_vcpu *vcpu)
+static int handle_io_inst(struct kvm_vcpu *vcpu)
 {
-   vcpu-stat.instruction_stsch++;
-   VCPU_EVENT(vcpu, 4, %s, store subchannel - CC3);
-   /* condition code 3 */
-   vcpu-arch.sie_block-gpsw.mask = ~(3ul  44);
-   vcpu-arch.sie_block-gpsw.mask |= (3  3ul)  44;
-   return 0;
-}
-
-static int handle_chsc(struct kvm_vcpu *vcpu)
-{
-   vcpu-stat.instruction_chsc++;
-   VCPU_EVENT(vcpu, 4, %s, channel subsystem call - CC3);
+   VCPU_EVENT(vcpu, 4, %s, I/O instruction);
/* condition code 3 */
vcpu-arch.sie_block-gpsw.mask = ~(3ul  44);
vcpu-arch.sie_block-gpsw.mask |= (3  3ul)  44;
@@ -392,7 +381,7 @@ out_fail:
return 0;
 }
 
-static intercept_handler_t priv_handlers[256] = {
+static intercept_handler_t b2_handlers[256] = {
[0x02] = handle_stidp,
[0x10] = handle_set_prefix,
[0x11] = handle_store_prefix,
@@ -400,8 +389,22 @@ static intercept_handler_t priv_handlers[256] = {
[0x29] = handle_skey,
[0x2a] = handle_skey,
[0x2b] = handle_skey,
-   [0x34] = handle_stsch,
-   [0x5f] = handle_chsc,
+   [0x30] = handle_io_inst,
+   [0x31] = handle_io_inst,
+   [0x32] = handle_io_inst,
+   [0x33] = handle_io_inst,
+   [0x34] = handle_io_inst,
+   [0x35] = handle_io_inst,
+   [0x36] = handle_io_inst,
+   [0x37] = handle_io_inst,
+   [0x38] = handle_io_inst,
+   [0x39] = handle_io_inst,
+   [0x3a] = handle_io_inst,
+   [0x3b] = handle_io_inst,
+   [0x3c] = handle_io_inst,
+   [0x5f] = handle_io_inst,
+   [0x74] = handle_io_inst,
+   [0x76] = handle_io_inst,
[0x7d] = handle_stsi,
[0xb1] = handle_stfl,
[0xb2] = handle_lpswe,
@@ -418,7 +421,7 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
 * state bit and (a) handle the instruction or (b) send a code 2
 * program check.
 * Anything else goes to userspace.*/
-   handler = priv_handlers[vcpu-arch.sie_block-ipa  0x00ff];
+   handler = b2_handlers[vcpu-arch.sie_block-ipa  0x00ff];
if (handler) {
if (vcpu-arch.sie_block-gpsw.mask  PSW_MASK_PSTATE)

[PATCH 5/5] KVM: s390: Add support for channel I/O instructions.

2012-10-31 Thread Cornelia Huck
Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
intercepts for channel I/O instructions to userspace. Only I/O
instructions interacting with I/O interrupts need to be handled
in-kernel:

- TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
  interrupts entirely in-kernel.
- TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
  and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
  related processing.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt | 30 
 arch/s390/include/asm/kvm_host.h  |  1 +
 arch/s390/kvm/intercept.c |  1 +
 arch/s390/kvm/interrupt.c | 37 +++
 arch/s390/kvm/kvm-s390.c  | 12 +
 arch/s390/kvm/kvm-s390.h  |  2 +
 arch/s390/kvm/priv.c  | 96 +--
 arch/s390/kvm/trace-s390.h| 20 
 include/linux/kvm.h   | 11 +
 include/trace/events/kvm.h|  2 +-
 10 files changed, 207 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a0408cf..2233081 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2294,6 +2294,22 @@ The possible hypercalls are defined in the Power 
Architecture Platform
 Requirements (PAPR) document available from www.power.org (free
 developer registration required to access it).
 
+   /* KVM_EXIT_S390_TSCH */
+   struct {
+   __u16 subchannel_id;
+   __u16 subchannel_nr;
+   __u32 io_int_parm;
+   __u32 io_int_word;
+   __u32 ipb;
+   __u8 dequeued;
+   } s390_tsch;
+
+s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
+and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
+interrupt for the target subchannel has been dequeued and subchannel_id,
+subchannel_nr, io_int_parm and io_int_word contain the parameters for that
+interrupt. ipb is needed for instruction parameter decoding.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -2415,3 +2431,17 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and 
KVM_MMU_FSL_BOOKE_HV:
where num_sets is the tlb_sizes[] value divided by the tlb_ways[] value.
  - The tsize field of mas1 shall be set to 4K on TLB0, even though the
hardware ignores this value for TLB0.
+
+6.4 KVM_CAP_S390_CSS_SUPPORT
+
+Architectures: s390
+Parameters: none
+Returns: 0 on success; -1 on error
+
+This capability enables support for handling of channel I/O instructions.
+
+TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
+handled in-kernel, while the other I/O instructions are passed to userspace.
+
+When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
+SUBCHANNEL intercepts.
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773859e..091c581 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -264,6 +264,7 @@ struct kvm_arch{
debug_info_t *dbf;
struct kvm_s390_float_interrupt float_int;
struct gmap *gmap;
+   int css_support;
 };
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 754dc9e..9ab2efd 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -273,6 +273,7 @@ static const intercept_handler_t intercept_funcs[] = {
[0x0C  2] = handle_instruction_and_prog,
[0x10  2] = handle_noop,
[0x14  2] = handle_noop,
+   [0x18  2] = handle_noop,
[0x1C  2] = kvm_s390_handle_wait,
[0x20  2] = handle_validity,
[0x28  2] = handle_stop,
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 6b10267..495a411 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -714,6 +714,43 @@ int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 
code)
return 0;
 }
 
+struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
+   u64 cr6, u64 schid)
+{
+   struct kvm_s390_float_interrupt *fi;
+   struct kvm_s390_interrupt_info *inti, *iter;
+
+   if ((!schid  !cr6) || (schid  cr6))
+   return NULL;
+   mutex_lock(kvm-lock);
+   fi = kvm-arch.float_int;
+   spin_lock(fi-lock);
+   inti = NULL;
+   list_for_each_entry(iter, fi-list, list) {
+   if (!is_ioint(iter-type))
+   continue;
+   if (cr6  ((cr6  iter-io.io_int_word) == 0))
+   continue;
+   if (schid) {
+   if (((schid  0x)  16) !=
+   iter-io.subchannel_id)
+

[PATCH 4/5] KVM: s390: Base infrastructure for enabling capabilities.

2012-10-31 Thread Cornelia Huck
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/s390/kvm/kvm-s390.c  | 26 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 5d09948..a0408cf 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -913,7 +913,7 @@ documentation when it pops into existence).
 4.37 KVM_ENABLE_CAP
 
 Capability: KVM_CAP_ENABLE_CAP
-Architectures: ppc
+Architectures: ppc, s390
 Type: vcpu ioctl
 Parameters: struct kvm_enable_cap (in)
 Returns: 0 on success; -1 on error
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 38883f0..c19568b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -140,6 +140,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 #endif
case KVM_CAP_SYNC_REGS:
case KVM_CAP_ONE_REG:
+   case KVM_CAP_ENABLE_CAP:
r = 1;
break;
case KVM_CAP_NR_VCPUS:
@@ -807,6 +808,22 @@ int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, 
unsigned long addr)
return 0;
 }
 
+static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
+struct kvm_enable_cap *cap)
+{
+   int r;
+
+   if (cap-flags)
+   return -EINVAL;
+
+   switch (cap-cap) {
+   default:
+   r = -EINVAL;
+   break;
+   }
+   return r;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
@@ -893,6 +910,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = 0;
break;
}
+   case KVM_ENABLE_CAP:
+   {
+   struct kvm_enable_cap cap;
+   r = -EFAULT;
+   if (copy_from_user(cap, argp, sizeof(cap)))
+   break;
+   r = kvm_vcpu_ioctl_enable_cap(vcpu, cap);
+   break;
+   }
default:
r = -ENOTTY;
}
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Update linux headers.

2012-10-31 Thread Cornelia Huck
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 linux-headers/asm-generic/kvm_para.h |  5 +++
 linux-headers/asm-powerpc/kvm.h  | 59 ++
 linux-headers/asm-powerpc/kvm_para.h |  7 +++--
 linux-headers/asm-x86/kvm.h  | 17 ++
 linux-headers/linux/kvm.h| 61 ++--
 5 files changed, 137 insertions(+), 12 deletions(-)
 create mode 100644 linux-headers/asm-generic/kvm_para.h

diff --git a/linux-headers/asm-generic/kvm_para.h 
b/linux-headers/asm-generic/kvm_para.h
new file mode 100644
index 000..63df88b
--- /dev/null
+++ b/linux-headers/asm-generic/kvm_para.h
@@ -0,0 +1,5 @@
+#ifndef _ASM_GENERIC_KVM_PARA_H
+#define _ASM_GENERIC_KVM_PARA_H
+
+
+#endif
diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 1bea4d8..b89ae4d 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -221,6 +221,12 @@ struct kvm_sregs {
 
__u32 dbsr; /* KVM_SREGS_E_UPDATE_DBSR */
__u32 dbcr[3];
+   /*
+* iac/dac registers are 64bit wide, while this API
+* interface provides only lower 32 bits on 64 bit
+* processors. ONE_REG interface is added for 64bit
+* iac/dac registers.
+*/
__u32 iac[4];
__u32 dac[2];
__u32 dvc[2];
@@ -326,5 +332,58 @@ struct kvm_book3e_206_tlb_params {
 };
 
 #define KVM_REG_PPC_HIOR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x1)
+#define KVM_REG_PPC_IAC1   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x2)
+#define KVM_REG_PPC_IAC2   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x3)
+#define KVM_REG_PPC_IAC3   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x4)
+#define KVM_REG_PPC_IAC4   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x5)
+#define KVM_REG_PPC_DAC1   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x6)
+#define KVM_REG_PPC_DAC2   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x7)
+#define KVM_REG_PPC_DABR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8)
+#define KVM_REG_PPC_DSCR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x9)
+#define KVM_REG_PPC_PURR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa)
+#define KVM_REG_PPC_SPURR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb)
+#define KVM_REG_PPC_DAR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc)
+#define KVM_REG_PPC_DSISR  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd)
+#define KVM_REG_PPC_AMR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xe)
+#define KVM_REG_PPC_UAMOR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xf)
+
+#define KVM_REG_PPC_MMCR0  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10)
+#define KVM_REG_PPC_MMCR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11)
+#define KVM_REG_PPC_MMCRA  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12)
+
+#define KVM_REG_PPC_PMC1   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18)
+#define KVM_REG_PPC_PMC2   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19)
+#define KVM_REG_PPC_PMC3   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1a)
+#define KVM_REG_PPC_PMC4   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1b)
+#define KVM_REG_PPC_PMC5   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1c)
+#define KVM_REG_PPC_PMC6   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1d)
+#define KVM_REG_PPC_PMC7   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1e)
+#define KVM_REG_PPC_PMC8   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1f)
+
+/* 32 floating-point registers */
+#define KVM_REG_PPC_FPR0   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x20)
+#define KVM_REG_PPC_FPR(n) (KVM_REG_PPC_FPR0 + (n))
+#define KVM_REG_PPC_FPR31  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x3f)
+
+/* 32 VMX/Altivec vector registers */
+#define KVM_REG_PPC_VR0(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x40)
+#define KVM_REG_PPC_VR(n)  (KVM_REG_PPC_VR0 + (n))
+#define KVM_REG_PPC_VR31   (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x5f)
+
+/* 32 double-width FP registers for VSX */
+/* High-order halves overlap with FP regs */
+#define KVM_REG_PPC_VSR0   (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x60)
+#define KVM_REG_PPC_VSR(n) (KVM_REG_PPC_VSR0 + (n))
+#define KVM_REG_PPC_VSR31  (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x7f)
+
+/* FP and vector status/control registers */
+#define KVM_REG_PPC_FPSCR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x80)
+#define KVM_REG_PPC_VSCR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x81)
+
+/* Virtual processor areas */
+/* For SLB  DTL, address in high (first) half, length in low half */
+#define KVM_REG_PPC_VPA_ADDR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x82)
+#define KVM_REG_PPC_VPA_SLB(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x83)
+#define KVM_REG_PPC_VPA_DTL(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x84)
 
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/linux-headers/asm-powerpc/kvm_para.h 
b/linux-headers/asm-powerpc/kvm_para.h
index c047a84..7e64f57 100644
--- a/linux-headers/asm-powerpc/kvm_para.h
+++ 

[PATCH 2/5] KVM: s390: Add support for machine checks.

2012-10-31 Thread Cornelia Huck
Add support for injecting machine checks (only repressible
conditions for now).

This is a bit more involved than I/O interrupts, for these reasons:

- Machine checks come in both floating and cpu varieties.
- We don't have a bit for machine checks enabling, but have to use
  a roundabout approach with trapping PSW changing instructions and
  watching for opened machine checks.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt |   4 ++
 arch/s390/include/asm/kvm_host.h  |   8 +++
 arch/s390/kvm/intercept.c |   2 +
 arch/s390/kvm/interrupt.c | 112 
 arch/s390/kvm/kvm-s390.h  |   3 +
 arch/s390/kvm/priv.c  | 133 ++
 arch/s390/kvm/trace-s390.h|   6 +-
 include/linux/kvm.h   |   1 +
 8 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b660a6e..5d09948 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2072,6 +2072,10 @@ KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound 
value to indicate an
 I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
 I/O interruption parameters in parm (subchannel) and parm64 (intparm,
 interruption subclass)
+KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
+   machine check interrupt code in parm64 (note that
+   machine checks needing further payload are not
+   supported by this ioctl)
 
 Note that the vcpu ioctl is asynchronous to vcpu execution.
 
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index e47f697..773859e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -77,8 +77,10 @@ struct kvm_s390_sie_block {
__u8reserved40[4];  /* 0x0040 */
 #define LCTL_CR0   0x8000
 #define LCTL_CR6   0x0200
+#define LCTL_CR14  0x0002
__u16   lctl;   /* 0x0044 */
__s16   icpua;  /* 0x0046 */
+#define ICTL_LPSW 0x0040
__u32   ictl;   /* 0x0048 */
__u32   eca;/* 0x004c */
__u8icptcode;   /* 0x0050 */
@@ -189,6 +191,11 @@ struct kvm_s390_emerg_info {
__u16 code;
 };
 
+struct kvm_s390_mchk_info {
+   __u64 cr14;
+   __u64 mcic;
+};
+
 struct kvm_s390_interrupt_info {
struct list_head list;
u64 type;
@@ -199,6 +206,7 @@ struct kvm_s390_interrupt_info {
struct kvm_s390_emerg_info emerg;
struct kvm_s390_extcall_info extcall;
struct kvm_s390_prefix_info prefix;
+   struct kvm_s390_mchk_info mchk;
};
 };
 
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 22798ec..ec1177f 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -106,10 +106,12 @@ static int handle_lctl(struct kvm_vcpu *vcpu)
 
 static intercept_handler_t instruction_handlers[256] = {
[0x01] = kvm_s390_handle_01,
+   [0x82] = kvm_s390_handle_lpsw,
[0x83] = kvm_s390_handle_diag,
[0xae] = kvm_s390_handle_sigp,
[0xb2] = kvm_s390_handle_b2,
[0xb7] = handle_lctl,
+   [0xb9] = kvm_s390_handle_b9,
[0xe5] = kvm_s390_handle_e5,
[0xeb] = handle_lctlg,
 };
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 070ba22..6b10267 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -41,6 +41,11 @@ static int psw_ioint_disabled(struct kvm_vcpu *vcpu)
return !(vcpu-arch.sie_block-gpsw.mask  PSW_MASK_IO);
 }
 
+static int psw_mchk_disabled(struct kvm_vcpu *vcpu)
+{
+   return !(vcpu-arch.sie_block-gpsw.mask  PSW_MASK_MCHECK);
+}
+
 static int psw_interrupts_disabled(struct kvm_vcpu *vcpu)
 {
if ((vcpu-arch.sie_block-gpsw.mask  PSW_MASK_PER) ||
@@ -82,6 +87,12 @@ static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
case KVM_S390_SIGP_SET_PREFIX:
case KVM_S390_RESTART:
return 1;
+   case KVM_S390_MCHK:
+   if (psw_mchk_disabled(vcpu))
+   return 0;
+   if (vcpu-arch.sie_block-gcr[14]  inti-mchk.cr14)
+   return 1;
+   return 0;
default:
if (is_ioint(inti-type)) {
if (psw_ioint_disabled(vcpu))
@@ -119,6 +130,7 @@ static void __reset_intercept_indicators(struct kvm_vcpu 
*vcpu)
CPUSTAT_IO_INT | CPUSTAT_EXT_INT | CPUSTAT_STOP_INT,
vcpu-arch.sie_block-cpuflags);
vcpu-arch.sie_block-lctl = 0x;
+   vcpu-arch.sie_block-ictl = ~ICTL_LPSW;
 }
 
 static void __set_cpuflag(struct kvm_vcpu *vcpu, u32 flag)
@@ -142,6 +154,12 @@ static void 

[PATCH 3/3] s390: Add new channel I/O based virtio transport.

2012-10-31 Thread Cornelia Huck
Add a new virtio transport that uses channel commands to perform
virtio operations.

Add a new machine type s390-ccw that uses this virtio-ccw transport
and make it the default machine for s390.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390-virtio.c   | 282 +++
 hw/s390x/Makefile.objs |   1 +
 hw/s390x/virtio-ccw.c  | 904 +
 hw/s390x/virtio-ccw.h  |  81 +
 4 files changed, 1195 insertions(+), 73 deletions(-)
 create mode 100644 hw/s390x/virtio-ccw.c
 create mode 100644 hw/s390x/virtio-ccw.h

diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c
index 685cb54..0d95a2b 100644
--- a/hw/s390-virtio.c
+++ b/hw/s390-virtio.c
@@ -30,9 +30,13 @@
 #include hw/sysbus.h
 #include kvm.h
 #include exec-memory.h
+#include qemu-thread.h
 
 #include hw/s390-virtio-bus.h
 #include hw/s390x/sclp.h
+#include hw/s390x/css.h
+#include hw/s390x/virtio-ccw.h
+#include hw/virtio-serial.h
 
 //#define DEBUG_S390
 
@@ -47,6 +51,7 @@
 #define KVM_S390_VIRTIO_NOTIFY  0
 #define KVM_S390_VIRTIO_RESET   1
 #define KVM_S390_VIRTIO_SET_STATUS  2
+#define KVM_S390_VIRTIO_CCW_NOTIFY  3
 
 #define KERN_IMAGE_START0x01UL
 #define KERN_PARM_AREA  0x010480UL
@@ -63,6 +68,7 @@
 
 static VirtIOS390Bus *s390_bus;
 static S390CPU **ipi_states;
+VirtioCcwBus *ccw_bus;
 
 S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
 {
@@ -76,15 +82,21 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
 int s390_virtio_hypercall(CPUS390XState *env, uint64_t mem, uint64_t hypercall)
 {
 int r = 0, i;
+int cssid, ssid, schid, m;
+SubchDev *sch;
 
 dprintf(KVM hypercall: %ld\n, hypercall);
 switch (hypercall) {
 case KVM_S390_VIRTIO_NOTIFY:
 if (mem  ram_size) {
-VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus,
-   mem, i);
-if (dev) {
-virtio_queue_notify(dev-vdev, i);
+if (s390_bus) {
+VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus,
+   mem, i);
+if (dev) {
+virtio_queue_notify(dev-vdev, i);
+} else {
+r = -EINVAL;
+}
 } else {
 r = -EINVAL;
 }
@@ -93,28 +105,49 @@ int s390_virtio_hypercall(CPUS390XState *env, uint64_t 
mem, uint64_t hypercall)
 }
 break;
 case KVM_S390_VIRTIO_RESET:
-{
-VirtIOS390Device *dev;
-
-dev = s390_virtio_bus_find_mem(s390_bus, mem);
-virtio_reset(dev-vdev);
-stb_phys(dev-dev_offs + VIRTIO_DEV_OFFS_STATUS, 0);
-s390_virtio_device_sync(dev);
-s390_virtio_reset_idx(dev);
+if (s390_bus) {
+VirtIOS390Device *dev;
+
+dev = s390_virtio_bus_find_mem(s390_bus, mem);
+virtio_reset(dev-vdev);
+stb_phys(dev-dev_offs + VIRTIO_DEV_OFFS_STATUS, 0);
+s390_virtio_device_sync(dev);
+s390_virtio_reset_idx(dev);
+} else {
+r = -EINVAL;
+}
 break;
-}
 case KVM_S390_VIRTIO_SET_STATUS:
-{
-VirtIOS390Device *dev;
+if (s390_bus) {
+VirtIOS390Device *dev;
 
-dev = s390_virtio_bus_find_mem(s390_bus, mem);
-if (dev) {
-s390_virtio_device_update_status(dev);
+dev = s390_virtio_bus_find_mem(s390_bus, mem);
+if (dev) {
+s390_virtio_device_update_status(dev);
+} else {
+r = -EINVAL;
+}
 } else {
 r = -EINVAL;
 }
 break;
-}
+case KVM_S390_VIRTIO_CCW_NOTIFY:
+if (ccw_bus) {
+if (ioinst_disassemble_sch_ident(env-regs[2], m, cssid, ssid,
+ schid)) {
+r = -EINVAL;
+} else {
+sch = css_find_subch(m, cssid, ssid, schid);
+if (sch  css_subch_visible(sch)) {
+virtio_queue_notify(virtio_ccw_get_vdev(sch), 
env-regs[3]);
+} else {
+r = -EINVAL;
+}
+}
+ } else {
+ r = -EINVAL;
+ }
+ break;
 default:
 r = -EINVAL;
 break;
@@ -151,60 +184,12 @@ unsigned s390_del_running_cpu(CPUS390XState *env)
 return s390_running_cpus;
 }
 
-/* PC hardware initialisation */
-static void s390_init(QEMUMachineInitArgs *args)
+static CPUS390XState *s390_init_cpus(const char *cpu_model,
+ uint8_t *storage_keys)
 {
-ram_addr_t my_ram_size = args-ram_size;
-ram_addr_t ram_size = args-ram_size;
-const char *cpu_model = args-cpu_model;
-const char *kernel_filename = args-kernel_filename;
-const char 

[RFC PATCH v3 0/3] s390: channel I/O support in qemu.

2012-10-31 Thread Cornelia Huck
Hi,

here's the latest version of my patchset introducing virtio-ccw.

This has been reworked with the changed kernel interface: qemu
will now handle all channel I/O requests (except the I/O interrupt
related ones that are handled in-kernel in the kvm case). This
avoids duplicating code in qemu and in kvm.

There are some misc fixes as well (mainly related to virtio-ccw).
Use of mutexes has hopefully been exorcised for now.

Unfortuately, patch 2 is now rather large - but I couldn't think
of a good way to split it up.

I still know of various things that need looking into (memory
accesses, for one), but I'd like some feedback about the new
interface first.

Cornelia Huck (3):
  Update linux headers.
  s390: Virtual channel subsystem support.
  s390: Add new channel I/O based virtio transport.

 hw/s390-virtio.c |  282 ++--
 hw/s390x/Makefile.objs   |2 +
 hw/s390x/css.c   | 1209 ++
 hw/s390x/css.h   |   90 +++
 hw/s390x/virtio-ccw.c|  904 +
 hw/s390x/virtio-ccw.h|   81 +++
 linux-headers/asm-generic/kvm_para.h |5 +
 linux-headers/asm-powerpc/kvm.h  |   59 ++
 linux-headers/asm-powerpc/kvm_para.h |7 +-
 linux-headers/asm-x86/kvm.h  |   17 +
 linux-headers/linux/kvm.h|   61 +-
 target-s390x/Makefile.objs   |2 +-
 target-s390x/cpu.h   |  232 +++
 target-s390x/helper.c|  146 
 target-s390x/ioinst.c|  737 +
 target-s390x/ioinst.h|  213 ++
 target-s390x/kvm.c   |  251 ++-
 target-s390x/misc_helper.c   |6 +-
 18 files changed, 4204 insertions(+), 100 deletions(-)
 create mode 100644 hw/s390x/css.c
 create mode 100644 hw/s390x/css.h
 create mode 100644 hw/s390x/virtio-ccw.c
 create mode 100644 hw/s390x/virtio-ccw.h
 create mode 100644 linux-headers/asm-generic/kvm_para.h
 create mode 100644 target-s390x/ioinst.c
 create mode 100644 target-s390x/ioinst.h

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case

2012-10-31 Thread Raghavendra K T

On 10/31/2012 07:11 PM, Avi Kivity wrote:

On 10/31/2012 03:15 PM, Raghavendra K T wrote:

On 10/31/2012 06:11 PM, Raghavendra K T wrote:

On 10/31/2012 06:08 PM, Avi Kivity wrote:

On 10/29/2012 04:07 PM, Raghavendra K T wrote:

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Also we do not update last boosted vcpu in failure cases.

   #endif
+
   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   {
   struct kvm *kvm = me-kvm;
@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   continue;
   if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
   continue;
-if (kvm_vcpu_yield_to(vcpu)) {
+
+yielded = kvm_vcpu_yield_to(vcpu);
+if (yielded  0)
   kvm-last_boosted_vcpu = i;
-yielded = 1;
+if (yielded)
   break;
-}
   }


If yielded == -ESRCH, should we not try to yield to another vcpu?



   Yes. plan is to abort the iteration. since it means we are mostly
undercommitted.


Sorry if it was ambiguous. I wanted to say we do not want to continue
yield to another vcpu..




Why not?  We found that this particular vcpu is running and therefore
likely not a lock holder.  That says nothing about other vcpus.  The
next in line might be runnable-but-not-running on another runqueue.


Agree that next in the line might be runnable-not-running.  But here we
are optimistic that, that is not the case and we save time by
returning back instead of iterating, thinking we are mostly in
undercommitted case and each vcpu has dedicated cpu.

Probably an alternative we have here is to look for say 2-3 successive
failures before breaking out?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Autotest][PATCH 1/4] virt: Adds support for cancel migration in multi-host tests.

2012-10-31 Thread Jiří Župka
1) Start migration with stressed VM.
2) Wait cancel_delay and then cancel migration.
3) Check if machine is alive on source host.
4) Stop stress of VM.
5) Migrate machine again.
6) Check if machine is alive on destination host.

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 virttest/utils_test.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virttest/utils_test.py b/virttest/utils_test.py
index 13225d0..eda2099 100644
--- a/virttest/utils_test.py
+++ b/virttest/utils_test.py
@@ -759,7 +759,7 @@ class MultihostMigration(object):
 mig_error = True
 raise
 finally:
-if not mig_error:
+if not mig_error and cancel_delay is None:
 self._hosts_barrier(self.hosts,
 mig_data.mig_id,
 'test_finihed',
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >