date:20121004

Re: [PATCH 1/3] kvm tools: Fix powerpc build errors caused by recent changes

2012-10-04 Thread Michael Ellerman

On Fri, 2012-10-05 at 09:30 +0300, Pekka Enberg wrote:
> Applied all three patches, thanks Michael!

Thanks Pekka.

cheers



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL] virtio

2012-10-04 Thread Rusty Russell

The following changes since commit 925a6f0bf8bd122d5d2429af7f0ca0fecf4ae71f:

  Merge tag 'hwspinlock-3.6-fix' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock (2012-09-18 
11:58:54 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git virtio-next

for you to fetch changes up to ca16f580a5db7e60bfafe59a50bb133bd3347491:

  lguest: fix occasional crash in example launcher. (2012-10-04 12:12:59 +0930)


New workflow: same git trees pulled by linux-next get sent straight to Linus.
Git is awkward at shuffling patches compared with quilt or mq, but that doesn't
happen often once things get into my -next branch.


Alexey Khoroshilov (1):
  virtio: console: fix error handling in init() function

Asias He (3):
  virtio-blk: Add bio-based IO path for virtio-blk
  virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path
  virtio-blk: Disable callback in virtblk_done()

Brian Foley (2):
  virtio_mmio: fix off by one error allocating queue
  virtio_mmio: Don't attempt to create empty virtqueues

Dan Carpenter (1):
  virtio-blk: fix NULL checking in virtblk_alloc_req()

Jason Wang (2):
  virtio-ring: move queue_index to vring_virtqueue
  virtio: introduce an API to set affinity for a virtqueue

Masami Hiramatsu (5):
  virtio/console: Add splice_write support
  virtio/console: Add a failback for unstealable pipe buffer
  virtio/console: Wait until the port is ready on splice
  ftrace: Allow stealing pages from pipe buffer
  virtio/console: Allocate scatterlist according to the current pipe size

Michael S. Tsirkin (3):
  virtio-balloon: dependency fix
  virtio: support reserved vqs
  virtio: don't crash when device is buggy

Peter Senna Tschudin (1):
  drivers/virtio/virtio_pci.c: fix error return code

Rusty Russell (4):
  virtio_balloon: not EXPERIMENTAL any more.
  virtio: add help to CONFIG_VIRTIO option.
  virtio: remove CONFIG_VIRTIO_RING
  lguest: fix occasional crash in example launcher.

Yoshihiro YUNOMAE (2):
  tools: Add guest trace agent as a user tool
  tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace

 arch/s390/Kconfig   |1 -
 arch/x86/lguest/Kconfig |1 -
 drivers/block/virtio_blk.c  |  306 +++
 drivers/char/virtio_console.c   |  210 --
 drivers/lguest/lguest_device.c  |5 +-
 drivers/remoteproc/remoteproc_virtio.c  |5 +-
 drivers/rpmsg/Kconfig   |1 -
 drivers/s390/kvm/kvm_virtio.c   |5 +-
 drivers/virtio/Kconfig  |   17 +-
 drivers/virtio/Makefile |3 +-
 drivers/virtio/virtio.c |2 +-
 drivers/virtio/virtio_mmio.c|   29 ++-
 drivers/virtio/virtio_pci.c |   68 +-
 drivers/virtio/virtio_ring.c|   14 +-
 include/linux/virtio.h  |2 +
 include/linux/virtio_config.h   |   23 ++
 include/linux/virtio_ring.h |3 +-
 kernel/trace/trace.c|8 +-
 tools/lguest/lguest.c   |1 +
 tools/virtio/virtio-trace/Makefile  |   13 ++
 tools/virtio/virtio-trace/README|  118 +++
 tools/virtio/virtio-trace/trace-agent-ctl.c |  137 
 tools/virtio/virtio-trace/trace-agent-rw.c  |  192 +
 tools/virtio/virtio-trace/trace-agent.c |  270 +++
 tools/virtio/virtio-trace/trace-agent.h |   75 +++
 25 files changed, 1402 insertions(+), 107 deletions(-)
 create mode 100644 tools/virtio/virtio-trace/Makefile
 create mode 100644 tools/virtio/virtio-trace/README
 create mode 100644 tools/virtio/virtio-trace/trace-agent-ctl.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent-rw.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.c
 create mode 100644 tools/virtio/virtio-trace/trace-agent.h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it

2012-10-04 Thread Alexander Graf

On 05.10.2012, at 04:17, Anthony Liguori wrote:

> Alexander Graf  writes:
> 
>> On 03.10.2012, at 22:26, Peter Maydell wrote:
>> 
>>> On 3 October 2012 21:01, Blue Swirl  wrote:
 On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori  
 wrote:
> Jan Kiszka  writes:
>> +/* The default accelerator depends on the availability of KVM. 
>> */
>> +p = kvm_configured ? "kvm" : "tcg";
>>}
>>> 
> Blue/Aurelien, any objections?

 No, maybe a message could be printed that says that the default has
 changed, for a few releases.
>>> 
>>> I've lost track of the conversation, are we currently proposing
>>> the accelerator default to be "kvm" (as per the original patch
>>> you quote here) or "kvm:tcg" ?
>>> 
>>> I'm not entirely sure which I prefer from an ARM perspective
>>> For some time to come and for a lot of targets (ie any target
>>> CPU except A15), having a default of "kvm" is going to cause
>>> existing working commandlines to stop working. [I expect that
>>> ARM-host qemu binaries will be built with CONFIG_KVM once ARM
>>> KVM support lands, but the same binary will be run on hosts
>>> without virtualization extensions.] On the other hand, perhaps
>>> there just aren't really very many people who run QEMU on
>>> ARM hosts, and so we can ignore them :-)
>> 
>> We get similar problems on PPC. Take the following example:
>> 
>>  $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic
> 
> But do you really expect people to do this?  I have to believe that
> people running on PPC hardware and running qemu-system-ppc most likely
> want to do KVM...

Sure. But we wouldn't be able to even tell them what went wrong, as we don't 
have a negotiation mechanism right now that could tell user space "hey, the CPU 
you selected is unknown to me".

However, if during cpu init we could add such a check and then fall back to tcg 
mode if accel=kvm:tcg with a warning, that'd be nice user experience.

We could do the same for ARM. If you do -M beagle on an A15 KVM enabled 
machine, you would still be able to do so, but KVM tells you it can't emulate 
an A8 right now. And if in the future KVM learns how to expose an A8 on A15, we 
could just not bail out and things would magically work.

Apart from that, I like the idea of kvm:tcg with a warning as the default for 
qemu. We should still have a qemu-kvm binary in the distro that does accel=kvm 
so people don't accidentally fall back to tcg mode.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 06/14] KVM: ARM: Memory virtualization setup

2012-10-04 Thread Min-gyu Kim



> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Monday, October 01, 2012 6:11 PM
> To: kvm@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> kvm...@lists.cs.columbia.edu
> Cc: Marc Zyngier
> Subject: [PATCH v2 06/14] KVM: ARM: Memory virtualization setup
> 
> +static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache
> *cache,
> +phys_addr_t addr, const pte_t *new_pte) {
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte, old_pte;
> +
> + /* Create 2nd stage page table mapping - Level 1 */
> + pgd = kvm->arch.pgd + pgd_index(addr);
> + pud = pud_offset(pgd, addr);
> + if (pud_none(*pud)) {
> + if (!cache)
> + return; /* ignore calls from kvm_set_spte_hva */
> + pmd = mmu_memory_cache_alloc(cache);
> + pud_populate(NULL, pud, pmd);
> + pmd += pmd_index(addr);
> + get_page(virt_to_page(pud));
> + } else
> + pmd = pmd_offset(pud, addr);
> +
> + /* Create 2nd stage page table mapping - Level 2 */
> + if (pmd_none(*pmd)) {
> + if (!cache)
> + return; /* ignore calls from kvm_set_spte_hva */
> + pte = mmu_memory_cache_alloc(cache);
> + clean_pte_table(pte);
> + pmd_populate_kernel(NULL, pmd, pte);
> + pte += pte_index(addr);
> + get_page(virt_to_page(pmd));
> + } else
> + pte = pte_offset_kernel(pmd, addr);
> +
> + /* Create 2nd stage page table mapping - Level 3 */
> + old_pte = *pte;
> + set_pte_ext(pte, *new_pte, 0);
> + if (pte_present(old_pte))
> + __kvm_tlb_flush_vmid(kvm);
> + else
> + get_page(virt_to_page(pte));
> +}


I'm not sure about the 3-level page table, but isn't it necessary to
clean the page table for 2nd level?
There are two mmu_memory_cache_alloc calls. One has following clean_pte_table
and the other doesn't have. 

And why do you ignore calls from kvm_set_spte_hva? It is supposed to happen when
host moves the page, right? Then you ignore the case because it can be handled
later when fault actually happens? Is there any other reason that I miss?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it

2012-10-04 Thread Anthony Liguori

Alexander Graf  writes:

> On 03.10.2012, at 22:26, Peter Maydell wrote:
>
>> On 3 October 2012 21:01, Blue Swirl  wrote:
>>> On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori  
>>> wrote:
 Jan Kiszka  writes:
> +/* The default accelerator depends on the availability of KVM. */
> +p = kvm_configured ? "kvm" : "tcg";
> }
>> 
 Blue/Aurelien, any objections?
>>> 
>>> No, maybe a message could be printed that says that the default has
>>> changed, for a few releases.
>> 
>> I've lost track of the conversation, are we currently proposing
>> the accelerator default to be "kvm" (as per the original patch
>> you quote here) or "kvm:tcg" ?
>> 
>> I'm not entirely sure which I prefer from an ARM perspective
>> For some time to come and for a lot of targets (ie any target
>> CPU except A15), having a default of "kvm" is going to cause
>> existing working commandlines to stop working. [I expect that
>> ARM-host qemu binaries will be built with CONFIG_KVM once ARM
>> KVM support lands, but the same binary will be run on hosts
>> without virtualization extensions.] On the other hand, perhaps
>> there just aren't really very many people who run QEMU on
>> ARM hosts, and so we can ignore them :-)
>
> We get similar problems on PPC. Take the following example:
>
>   $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic

But do you really expect people to do this?  I have to believe that
people running on PPC hardware and running qemu-system-ppc most likely
want to do KVM...

Kernel development seems unlikely...  The folks that I know doing PPC
kernel development with QEMU still qemu-system-ppc on x86.

Regards,

Anthony Liguori

>
> That command line would work just fine if you run it on a G5 today. However, 
> if we switch to accel=kvm:tcg, it will stop working, because kvm on the G5 
> can not virtualize an e500 CPU.
>
> I don't know any good way around that one either the way the code is layered 
> today. We only know the type of CPU we want to create after we made the 
> decision whether we do KVM or not.
>
>
> Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it

2012-10-04 Thread Alexander Graf


On 03.10.2012, at 22:26, Peter Maydell wrote:

> On 3 October 2012 21:01, Blue Swirl  wrote:
>> On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori  
>> wrote:
>>> Jan Kiszka  writes:
 +/* The default accelerator depends on the availability of KVM. */
 +p = kvm_configured ? "kvm" : "tcg";
 }
> 
>>> Blue/Aurelien, any objections?
>> 
>> No, maybe a message could be printed that says that the default has
>> changed, for a few releases.
> 
> I've lost track of the conversation, are we currently proposing
> the accelerator default to be "kvm" (as per the original patch
> you quote here) or "kvm:tcg" ?
> 
> I'm not entirely sure which I prefer from an ARM perspective
> For some time to come and for a lot of targets (ie any target
> CPU except A15), having a default of "kvm" is going to cause
> existing working commandlines to stop working. [I expect that
> ARM-host qemu binaries will be built with CONFIG_KVM once ARM
> KVM support lands, but the same binary will be run on hosts
> without virtualization extensions.] On the other hand, perhaps
> there just aren't really very many people who run QEMU on
> ARM hosts, and so we can ignore them :-)

We get similar problems on PPC. Take the following example:

  $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic

That command line would work just fine if you run it on a G5 today. However, if 
we switch to accel=kvm:tcg, it will stop working, because kvm on the G5 can not 
virtualize an e500 CPU.

I don't know any good way around that one either the way the code is layered 
today. We only know the type of CPU we want to create after we made the 
decision whether we do KVM or not.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] kvm tools: Do setup_fdt() later, get powerpc to boot again

2012-10-04 Thread Michael Ellerman

In commit e3d3ced "kernel load/firmware cleanup", the call to
kvm__arch_setup_firmware() was moved. Previously more or less at the end
of the init sequence, but that commit moved it into kvm__init() which
is a core_init() call and so runs quite early.

This broke booting powerpc guests, as setup_fdt() needs to be called
later in the setup sequence. In particular it looks at kvm->nrcpus,
which is uninitialised at that point.

In general setup_fdt() needs to run late in the sequence, as it encodes
the setup of the machine into the device tree.

So move setup_fdt() out of kvm__arch_setup_firmware() and make it a
firmware_init() call of its own.

With this patch I am able to boot guests again on HV KVM.

Signed-off-by: Michael Ellerman 
---
 tools/kvm/powerpc/kvm.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index e4f5315..d675265 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -286,7 +286,7 @@ static void generate_segment_page_sizes(struct 
kvm_ppc_smmu_info *info, struct f
  * and whilst most PPC targets will require CPU/memory nodes, others like RTAS
  * should eventually be added separately.
  */
-static void setup_fdt(struct kvm *kvm)
+static int setup_fdt(struct kvm *kvm)
 {
uint64_tmem_reg_property[] = { 0, cpu_to_be64(kvm->ram_size) };
int smp_cpus = kvm->nrcpus;
@@ -488,7 +488,10 @@ static void setup_fdt(struct kvm *kvm)
_FDT(fdt_pack(fdt_dest));
 
free(segment_page_sizes.value);
+
+   return 0;
 }
+firmware_init(setup_fdt);
 
 /**
  * kvm__arch_setup_firmware
@@ -517,9 +520,6 @@ int kvm__arch_setup_firmware(struct kvm *kvm)
 
/* Load SLOF */
 
-   /* Init FDT */
-   setup_fdt(kvm);
-
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] kvm tools: Fix segfault on powerpc in xics_register()

2012-10-04 Thread Michael Ellerman

In commit 06e6648 "move kvm_cpus into struct kvm", kvm_cpu__init() became
kvm_cpu__arch_init() called from a new kvm_cpu__init(), and the call was moved
from the end of the init sequence to much earlier, and in particular prior to
irq__init().

This leads to a segfault on powerpc, because kvm_cpu__arch_init() calls into
xics_cpu_register(), which dereferences vcpu->kvm.icp which is uninitialised
until irq__init().

Later in commit a48488d "use init/exit where possible", irq__init() was pulled
out of the init sequence and made a dev_base_init() routine, on x86. On powerpc
the call to irq__init() was dropped entirely.

Finally, we now have a circular dependency between kvm_cpu__init() (which needs
kvm->arch.icp), and irq__init() (which needs kvm->nrcpus). This is caused by
the combination of commit 89f40a7 "move nrcpus into struct kvm_config",
which moved the global nrcpus into kvm->cfg, and commit 06e6648 "move kvm_cpus
into struct kvm", which moved the setup of kvm->nrcpus from kvm->cfg into
kvm_cpu__init().

To fix it we drop irq__init() entirely, if we ever have a non xics irq option
we can bring it back. We turn xics_system_init() into xics_init(), and have it
do the allocation and setup of the icp/ics, including the per-vcpu setup,
removing the dependency from kvm_cpu__init() (via kvm_cpu__arch_init()).

xics_init() is a base_init() routine, it can't be core, which should be early
enough, fingers crossed.

Finally drop irq__exit(), it does nothing and is never called.

Signed-off-by: Michael Ellerman 
---
 tools/kvm/powerpc/irq.c |   19 ---
 tools/kvm/powerpc/kvm-cpu.c |3 ---
 tools/kvm/powerpc/xics.c|   38 +++---
 tools/kvm/powerpc/xics.h|5 -
 4 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index 6d134c5..e89fa3b 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -26,8 +26,6 @@
 #include "xics.h"
 #include "spapr_pci.h"
 
-#define XICS_IRQS   1024
-
 /*
  * FIXME: The code in this file assumes an SPAPR guest, using XICS.  Make
  * generic & cope with multiple PPC platform types.
@@ -51,23 +49,6 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
return 0;
 }
 
-int irq__init(struct kvm *kvm)
-{
-   /*
-* kvm->nr_cpus is now valid; for /now/, pass
-* this to xics_system_init(), which assumes servers
-* are numbered 0..nrcpus.  This may not really be true,
-* but it is OK currently.
-*/
-   kvm->arch.icp = xics_system_init(XICS_IRQS, kvm->nrcpus);
-   return 0;
-}
-
-int irq__exit(struct kvm *kvm)
-{
-   return 0;
-}
-
 int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg)
 {
die(__FUNCTION__);
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 6aaf424..8fce121 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -93,9 +93,6 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned 
long cpu_id)
 */
vcpu->is_running = true;
 
-   /* Register with IRQ controller (FIXME, assumes XICS) */
-   xics_cpu_register(vcpu);
-
return vcpu;
 }
 
diff --git a/tools/kvm/powerpc/xics.c b/tools/kvm/powerpc/xics.c
index 1cf9558..d4b5caa 100644
--- a/tools/kvm/powerpc/xics.c
+++ b/tools/kvm/powerpc/xics.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+#define XICS_NUM_IRQS  1024
+
 
 /* #define DEBUG_XICS yes */
 #ifdef DEBUG_XICS
@@ -441,26 +443,19 @@ static void rtas_int_on(struct kvm_cpu *vcpu, uint32_t 
token,
rtas_st(vcpu->kvm, rets, 0, 0); /* Success */
 }
 
-void xics_cpu_register(struct kvm_cpu *vcpu)
-{
-   if (vcpu->cpu_id < vcpu->kvm->arch.icp->nr_servers)
-   vcpu->kvm->arch.icp->ss[vcpu->cpu_id].cpu = vcpu;
-   else
-   die("Setting invalid server for cpuid %ld\n", vcpu->cpu_id);
-}
-
-struct icp_state *xics_system_init(unsigned int nr_irqs, unsigned int nr_cpus)
+static int xics_init(struct kvm *kvm)
 {
int max_server_num;
unsigned int i;
struct icp_state *icp;
struct ics_state *ics;
+   int j;
 
-   max_server_num = nr_cpus;
+   max_server_num = kvm->nrcpus;
 
icp = malloc(sizeof(*icp));
icp->nr_servers = max_server_num + 1;
-   icp->ss = malloc(icp->nr_servers*sizeof(struct icp_server_state));
+   icp->ss = malloc(icp->nr_servers * sizeof(struct icp_server_state));
 
for (i = 0; i < icp->nr_servers; i++) {
icp->ss[i].xirr = 0;
@@ -475,14 +470,14 @@ struct icp_state *xics_system_init(unsigned int nr_irqs, 
unsigned int nr_cpus)
 */
 
ics = malloc(sizeof(*ics));
-   ics->nr_irqs = nr_irqs;
+   ics->nr_irqs = XICS_NUM_IRQS;
ics->offset = XICS_IRQ_OFFSET;
-   ics->irqs = malloc(nr_irqs * sizeof(struct ics_irq_state));
+   ics->irqs = malloc(ics->nr_irqs * sizeof(struct ics_irq_state));
 
icp

[PATCH 1/3] kvm tools: Fix powerpc build errors caused by recent changes

2012-10-04 Thread Michael Ellerman

Several caused by commit 8074303 "remove global kvm object",
ioport__setup_arch(), term_getc_iov() & term_getc() in the
spapr_hvcons.c code, and kvm_cpu__reboot() in rtas_power_off().

Commit 221b584 "move active_console into struct kvm_config" added
checks in h_put_term_char() & h_get_term_char() of
kvm->cfg.active_console but needs to be vcpu->kvm->cfg.active_console.

That commit also missed updates to term_putc() & term_getc() in
spapr_rtas.c, and I'm guessing that we need similar checks of
active_console in rtas_put_term_char() & rtas_get_term_char().

Signed-off-by: Michael Ellerman 
---
 tools/kvm/powerpc/ioport.c   |2 +-
 tools/kvm/powerpc/spapr_hvcons.c |6 +++---
 tools/kvm/powerpc/spapr_rtas.c   |   14 +-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/kvm/powerpc/ioport.c b/tools/kvm/powerpc/ioport.c
index a8e4dc3..264fb7e 100644
--- a/tools/kvm/powerpc/ioport.c
+++ b/tools/kvm/powerpc/ioport.c
@@ -12,7 +12,7 @@
 
 #include 
 
-void ioport__setup_arch(void)
+void ioport__setup_arch(struct kvm *kvm)
 {
/* PPC has no legacy ioports to set up */
 }
diff --git a/tools/kvm/powerpc/spapr_hvcons.c b/tools/kvm/powerpc/spapr_hvcons.c
index 1fe4bdb..0bdf75b 100644
--- a/tools/kvm/powerpc/spapr_hvcons.c
+++ b/tools/kvm/powerpc/spapr_hvcons.c
@@ -50,7 +50,7 @@ static unsigned long h_put_term_char(struct kvm_cpu *vcpu, 
unsigned long opcode,
do {
int ret;
 
-   if (kvm->cfg.active_console == CONSOLE_HV)
+   if (vcpu->kvm->cfg.active_console == CONSOLE_HV)
ret = term_putc_iov(&iov, 1, 0);
else
ret = 0;
@@ -74,14 +74,14 @@ static unsigned long h_get_term_char(struct kvm_cpu *vcpu, 
unsigned long opcode,
union hv_chario data;
struct iovec iov;
 
-   if (kvm->cfg.active_console != CONSOLE_HV)
+   if (vcpu->kvm->cfg.active_console != CONSOLE_HV)
return H_SUCCESS;
 
if (term_readable(0)) {
iov.iov_base = data.buf;
iov.iov_len = 16;
 
-   *len = term_getc_iov(&iov, 1, 0);
+   *len = term_getc_iov(vcpu->kvm, &iov, 1, 0);
*char0_7 = be64_to_cpu(data.a.char0_7);
*char8_15 = be64_to_cpu(data.a.char8_15);
} else {
diff --git a/tools/kvm/powerpc/spapr_rtas.c b/tools/kvm/powerpc/spapr_rtas.c
index 14a3462..c81d82b 100644
--- a/tools/kvm/powerpc/spapr_rtas.c
+++ b/tools/kvm/powerpc/spapr_rtas.c
@@ -41,7 +41,7 @@ static void rtas_display_character(struct kvm_cpu *vcpu,
uint32_t nret, target_ulong rets)
 {
char c = rtas_ld(vcpu->kvm, args, 0);
-   term_putc(CONSOLE_HV, &c, 1, 0);
+   term_putc(&c, 1, 0);
rtas_st(vcpu->kvm, rets, 0, 0);
 }
 
@@ -52,7 +52,10 @@ static void rtas_put_term_char(struct kvm_cpu *vcpu,
   uint32_t nret, target_ulong rets)
 {
char c = rtas_ld(vcpu->kvm, args, 0);
-   term_putc(CONSOLE_HV, &c, 1, 0);
+
+   if (vcpu->kvm->cfg.active_console == CONSOLE_HV)
+   term_putc(&c, 1, 0);
+
rtas_st(vcpu->kvm, rets, 0, 0);
 }
 
@@ -62,8 +65,9 @@ static void rtas_get_term_char(struct kvm_cpu *vcpu,
   uint32_t nret, target_ulong rets)
 {
int c;
-   if (term_readable(CONSOLE_HV, 0) &&
-   (c = term_getc(CONSOLE_HV, 0)) >= 0) {
+
+   if (vcpu->kvm->cfg.active_console == CONSOLE_HV && term_readable(0) &&
+   (c = term_getc(vcpu->kvm, 0)) >= 0) {
rtas_st(vcpu->kvm, rets, 0, 0);
rtas_st(vcpu->kvm, rets, 1, c);
} else {
@@ -115,7 +119,7 @@ static void rtas_power_off(struct kvm_cpu *vcpu,
rtas_st(vcpu->kvm, rets, 0, -3);
return;
}
-   kvm_cpu__reboot();
+   kvm_cpu__reboot(vcpu->kvm);
 }
 
 static void rtas_query_cpu_stopped_state(struct kvm_cpu *vcpu,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/15] i386: kvm: x2apic is not supported without in-kernel irqchip

2012-10-04 Thread Eduardo Habkost

This is necessary so that x2apic is not improperly enabled when the
in-kernel irqchip is disabled.

This won't generate a warning with "-cpu ...,check" because the current
check/enforce code is broken (it checks the host CPU data directly,
instead of using kvm_arch_get_supported_cpuid()), but it will be
eventually fixed to properly report the missing x2apic flag.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 704ad33..590f4d5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -209,6 +209,13 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
 }
+
+/* x2apic is reported by GET_SUPPORTED_CPUID, but it can't be enabled
+ * without the in-kernel irqchip
+ */
+if (!kvm_irqchip_in_kernel()) {
+ret &= ~CPUID_EXT_X2APIC;
+}
 } else if (function == 0x8001 && reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/15] i386: kvm: mask cpuid_ext4_features bits earlier

2012-10-04 Thread Eduardo Habkost

This way all the filtering by GET_SUPPORTED_CPUID is being done at the
same place in the code.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3f4bee5..335b3e7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -431,6 +431,9 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 env->cpuid_kvm_features &=
 kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
 
+env->cpuid_ext4_features &= kvm_arch_get_supported_cpuid(s, 0xC001,
+ 0, R_EDX);
+
 cpuid_i = 0;
 
 /* Paravirtualization CPUIDs */
@@ -572,8 +575,6 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 /* Call Centaur's CPUID instructions they are supported. */
 if (env->cpuid_xlevel2 > 0) {
-env->cpuid_ext4_features &=
-kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
 cpu_x86_cpuid(env, 0xC000, 0, &limit, &unused, &unused, &unused);
 
 for (i = 0xC000; i <= limit; i++) {
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/15] i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on kvm_arch_get_supported_cpuid()

2012-10-04 Thread Eduardo Habkost

This moves the CPUID_EXT_TSC_DEADLINE_TIMER CPUID flag hacking from
kvm_arch_init_vcpu() to kvm_arch_get_supported_cpuid().

Full git grep for kvm_arch_get_supported_cpuid:

   kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
   target-i386/cpu.c:x86_cpu_def->cpuid_7_0_ebx_features = 
kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EDX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EDX);
   target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   target-i386/kvm.c:cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 
0, R_EDX);
   target-i386/kvm.c:env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 
1, 0, R_EDX);
 * target-i386/kvm.c:env->cpuid_ext_features &= 
kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
   target-i386/kvm.c:env->cpuid_ext2_features &= 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env->cpuid_ext3_features &= 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env->cpuid_svm_features  &= 
kvm_arch_get_supported_cpuid(s, 0x800A,
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 
KVM_CPUID_FEATURES, 0, R_EAX);
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 0xC001, 0, 
R_EDX);

Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_TSC_DEADLINE_TIMER, so we
can simply make kvm_arch_get_supported_cpuid() set it, to let the rest
of the code know the flag can be safely set by QEMU.

One thing I was worrying about when doing this is that now
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(). But
the 'kvm_kernel_irqchip' global variable is initialized during
kvm_init(), that is called very early, and kvm_init() is already a
requirement to run the GET_SUPPORTED_CPUID ioctl() (as kvm_init() is the
function that initializes the 'kvm_state' global variable).

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1f451e3..704ad33 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -201,6 +201,14 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
  * GET_SUPPORTED_CPUID
  */
 ret |= CPUID_EXT_HYPERVISOR;
+/* tsc-deadline flag is not returned by GET_SUPPORTED_CPUID, but it
+ * can be enabled if the kernel has KVM_CAP_TSC_DEADLINE_TIMER,
+ * and the irqchip is in the kernel.
+ */
+if (kvm_irqchip_in_kernel() &&
+kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
+ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
+}
 } else if (function == 0x8001 && reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
@@ -404,12 +412,7 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 
-j = env->cpuid_ext_features & CPUID_EXT_TSC_DEADLINE_TIMER;
 env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-if (j && kvm_irqchip_in_kernel() &&
-kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
-env->cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
-}
 
 env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
  0, R_EDX);
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/15] i386: kvm: mask cpuid_kvm_features earlier

2012-10-04 Thread Eduardo Habkost

Instead of masking the KVM feature bits very late (while building the
KVM_SET_CPUID2 data), mask it out on env->cpuid_kvm_features, at the
same point where the other feature words are masked out.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 590f4d5..3f4bee5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -428,6 +428,9 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 env->cpuid_svm_features  &= kvm_arch_get_supported_cpuid(s, 0x800A,
  0, R_EDX);
 
+env->cpuid_kvm_features &=
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+
 cpuid_i = 0;
 
 /* Paravirtualization CPUIDs */
@@ -448,8 +451,7 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 c = &cpuid_data.entries[cpuid_i++];
 memset(c, 0, sizeof(*c));
 c->function = KVM_CPUID_FEATURES;
-c->eax = env->cpuid_kvm_features &
-kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+c->eax = env->cpuid_kvm_features;
 
 if (hyperv_enabled()) {
 memcpy(signature, "Hv#1\0\0\0\0\0\0\0\0", 12);
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/15] i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()

2012-10-04 Thread Eduardo Habkost

Full grep for kvm_arch_get_supported_cpuid:

   kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
   target-i386/cpu.c:x86_cpu_def->cpuid_7_0_ebx_features = 
kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EDX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EDX);
   target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   target-i386/kvm.c:cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 
0, R_EDX);
   target-i386/kvm.c:env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 
1, 0, R_EDX);
 * target-i386/kvm.c:env->cpuid_ext_features &= 
kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
   target-i386/kvm.c:env->cpuid_ext2_features &= 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env->cpuid_ext3_features &= 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env->cpuid_svm_features  &= 
kvm_arch_get_supported_cpuid(s, 0x800A,
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 
KVM_CPUID_FEATURES, 0, R_EAX);
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 0xC001, 0, 
R_EDX);

Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_HYPERVISOR, so we can
simply make kvm_arch_get_supported_cpuid() set it, to let the rest of
the code automatically know that the flag can be safely set by QEMU.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 084b40c..1f451e3 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -196,6 +196,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 if (function == 1 && reg == R_EDX) {
 /* KVM before 2.6.30 misreports the following features */
 ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+} else if (function == 1 && reg == R_ECX) {
+/* We can set the hypervisor flag, even if KVM does not return it on
+ * GET_SUPPORTED_CPUID
+ */
+ret |= CPUID_EXT_HYPERVISOR;
 } else if (function == 0x8001 && reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
@@ -399,10 +404,8 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 
-i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
 j = env->cpuid_ext_features & CPUID_EXT_TSC_DEADLINE_TIMER;
 env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-env->cpuid_ext_features |= i;
 if (j && kvm_irqchip_in_kernel() &&
 kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 env->cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/15] i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with single 'if'

2012-10-04 Thread Eduardo Habkost

Additional fixups will be added, and making them a single 'if/else if'
chain makes it clearer than two nested switch statements.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7115921..084b40c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -193,20 +193,15 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 
 /* Fixups for the data returned by KVM, below */
 
-if (reg == R_EDX) {
-switch (function) {
-case 1:
-/* KVM before 2.6.30 misreports the following features */
-ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
-break;
-case 0x8001:
-/* On Intel, kvm returns cpuid according to the Intel spec,
- * so add missing bits according to the AMD spec:
- */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
-break;
-}
+if (function == 1 && reg == R_EDX) {
+/* KVM before 2.6.30 misreports the following features */
+ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+} else if (function == 0x8001 && reg == R_EDX) {
+/* On Intel, kvm returns cpuid according to the Intel spec,
+ * so add missing bits according to the AMD spec:
+ */
+cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
 }
 
 g_free(cpuid);
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/15] i386: kvm: reformat filter_features_for_kvm() code

2012-10-04 Thread Eduardo Habkost

Cosmetic, but it will also help to make futher patches easier to review.

Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index a5c1628..7921b1b 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1366,22 +1366,20 @@ static void filter_features_for_kvm(X86CPU *cpu)
 CPUX86State *env = &cpu->env;
 KVMState *s = kvm_state;
 
-env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-
-env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-
-env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_EDX);
-env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_ECX);
-env->cpuid_svm_features  &= kvm_arch_get_supported_cpuid(s, 0x800A,
- 0, R_EDX);
-
+env->cpuid_features &=
+kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+env->cpuid_ext_features &=
+kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
+env->cpuid_ext2_features &=
+kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_EDX);
+env->cpuid_ext3_features &=
+kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
+env->cpuid_svm_features  &=
+kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
 env->cpuid_kvm_features &=
-kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
-
-env->cpuid_ext4_features &= kvm_arch_get_supported_cpuid(s, 0xC001,
- 0, R_EDX);
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+env->cpuid_ext4_features &=
+kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
 
 }
 #endif
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/15] i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function

2012-10-04 Thread Eduardo Habkost

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ae51573..3519028 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -145,11 +145,28 @@ static uint32_t cpuid_entry_get_reg(struct 
kvm_cpuid_entry2 *entry, int reg)
 return ret;
 }
 
+/* Find matching entry for function/index on kvm_cpuid2 struct
+ */
+static struct kvm_cpuid_entry2 *cpuid_find_entry(struct kvm_cpuid2 *cpuid,
+ uint32_t function,
+ uint32_t index)
+{
+int i;
+for (i = 0; i < cpuid->nent; ++i) {
+if (cpuid->entries[i].function == function &&
+cpuid->entries[i].index == index) {
+return &cpuid->entries[i];
+}
+}
+/* not found: */
+return NULL;
+}
+
 uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
   uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
-int i, max;
+int max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
 bool found = false;
@@ -159,13 +176,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 max *= 2;
 }
 
-for (i = 0; i < cpuid->nent; ++i) {
-if (cpuid->entries[i].function == function &&
-cpuid->entries[i].index == index) {
-struct kvm_cpuid_entry2 *entry = &cpuid->entries[i];
-found = true;
-ret = cpuid_entry_get_reg(entry, reg);
-}
+struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
+if (entry) {
+found = true;
+ret = cpuid_entry_get_reg(entry, reg);
 }
 
 /* Fixups for the data returned by KVM, below */
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/15] i386: kvm: filter CPUID feature words earlier, on cpu.c

2012-10-04 Thread Eduardo Habkost

cpu.c contains the code that will check if all requested CPU features
are available, so the filtering of KVM features must be there, so we can
implement "check" and "enforce" properly.

The only point where kvm_arch_init_vcpu() is called on i386 is:

- cpu_x86_init()
  - x86_cpu_realize() (after cpu_x86_register() is called)
- qemu_init_vcpu()
  - qemu_kvm_start_vcpu()
- qemu_kvm_thread_fn() (on a new thread)
  - kvm_init_vcpu()
- kvm_arch_init_vcpu()

With this patch, the filtering will be done earlier, at:
- cpu_x86_init()
  - cpu_x86_register() (before x86_cpu_realize() is called)

Also, the KVM CPUID filtering will now be done at the same place where
the TCG CPUID feature filtering is done. Later, the code can be changed
to use the same filtering code for the "check" and "enforce" modes, as
now the cpu.c code knows exactly which CPU features are going to be
exposed to the guest (and much earlier).

One thing I was worrying about when doing this is that
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(), and
maybe the 'kvm_kernel_irqchip' global variable wasn't initialized yet at
CPU creation time. But kvm_kernel_irqchip is initialized during
kvm_init(), that is called very early (much earlier than the machine
init function), and kvm_init() is already a requirement to run the
GET_SUPPORTED_CPUID ioctl() (as kvm_init() initializes the kvm_state
global variable).

Side note: it would be nice to keep KVM-specific code inside kvm.c. The
problem is that properly implementing -cpu check/enforce code (that's
inside cpu.c) depends directly on the feature bit filtering done using
kvm_arch_get_supported_cpuid(). Currently -cpu check/enforce is broken
because it simply uses the host CPU feature bits instead of
GET_SUPPORTED_CPUID, and we need to fix that.

Signed-off-by: Eduardo Habkost 
---
 kvm.h |  1 +
 target-i386/cpu.c | 30 ++
 target-i386/kvm.c | 18 --
 3 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/kvm.h b/kvm.h
index dea2998..150f95d 100644
--- a/kvm.h
+++ b/kvm.h
@@ -20,6 +20,7 @@
 
 #ifdef CONFIG_KVM
 #include 
+#include 
 #endif
 
 extern int kvm_allowed;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index bb1e44e..a5c1628 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1360,6 +1360,32 @@ CpuDefinitionInfoList *arch_query_cpu_definitions(Error 
**errp)
 return cpu_list;
 }
 
+#ifdef CONFIG_KVM
+static void filter_features_for_kvm(X86CPU *cpu)
+{
+CPUX86State *env = &cpu->env;
+KVMState *s = kvm_state;
+
+env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+
+env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
+
+env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
+ 0, R_EDX);
+env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
+ 0, R_ECX);
+env->cpuid_svm_features  &= kvm_arch_get_supported_cpuid(s, 0x800A,
+ 0, R_EDX);
+
+env->cpuid_kvm_features &=
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+
+env->cpuid_ext4_features &= kvm_arch_get_supported_cpuid(s, 0xC001,
+ 0, R_EDX);
+
+}
+#endif
+
 int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
 {
 CPUX86State *env = &cpu->env;
@@ -1417,6 +1443,10 @@ int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
 );
 env->cpuid_ext3_features &= TCG_EXT3_FEATURES;
 env->cpuid_svm_features &= TCG_SVM_FEATURES;
+} else {
+#ifdef CONFIG_KVM
+filter_features_for_kvm(cpu);
+#endif
 }
 object_property_set_str(OBJECT(cpu), def->model_id, "model-id", &error);
 if (error_is_set(&error)) {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 335b3e7..0257083 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -410,30 +410,12 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 struct kvm_cpuid2 cpuid;
 struct kvm_cpuid_entry2 entries[100];
 } QEMU_PACKED cpuid_data;
-KVMState *s = env->kvm_state;
 uint32_t limit, i, j, cpuid_i;
 uint32_t unused;
 struct kvm_cpuid_entry2 *c;
 uint32_t signature[3];
 int r;
 
-env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-
-env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-
-env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_EDX);
-env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_ECX);
-env->cpuid_svm_features  &= kvm_arch_get_supported_cpuid(s, 0x800A,
-

[PATCH 04/15] i386: kvm: extract register switch to cpuid_entry_get_reg() function

2012-10-04 Thread Eduardo Habkost

No behavior change: just code movement.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 22e8564..ae51573 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -123,6 +123,28 @@ static int get_para_features(KVMState *s)
 }
 
 
+/* Returns the value for a specific register on the cpuid entry
+ */
+static uint32_t cpuid_entry_get_reg(struct kvm_cpuid_entry2 *entry, int reg)
+{
+uint32_t ret = 0;
+switch (reg) {
+case R_EAX:
+ret = entry->eax;
+break;
+case R_EBX:
+ret = entry->ebx;
+break;
+case R_ECX:
+ret = entry->ecx;
+break;
+case R_EDX:
+ret = entry->edx;
+break;
+}
+return ret;
+}
+
 uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
   uint32_t index, int reg)
 {
@@ -142,20 +164,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 cpuid->entries[i].index == index) {
 struct kvm_cpuid_entry2 *entry = &cpuid->entries[i];
 found = true;
-switch (reg) {
-case R_EAX:
-ret = entry->eax;
-break;
-case R_EBX:
-ret = entry->ebx;
-break;
-case R_ECX:
-ret = entry->ecx;
-break;
-case R_EDX:
-ret = entry->edx;
-break;
-}
+ret = cpuid_entry_get_reg(entry, reg);
 }
 }
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/15] i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable

2012-10-04 Thread Eduardo Habkost

The reg switch will be moved to a separate function, so store the entry
pointer in a variable.

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9ebde66..22e8564 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -140,19 +140,20 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 for (i = 0; i < cpuid->nent; ++i) {
 if (cpuid->entries[i].function == function &&
 cpuid->entries[i].index == index) {
+struct kvm_cpuid_entry2 *entry = &cpuid->entries[i];
 found = true;
 switch (reg) {
 case R_EAX:
-ret = cpuid->entries[i].eax;
+ret = entry->eax;
 break;
 case R_EBX:
-ret = cpuid->entries[i].ebx;
+ret = entry->ebx;
 break;
 case R_ECX:
-ret = cpuid->entries[i].ecx;
+ret = entry->ecx;
 break;
 case R_EDX:
-ret = cpuid->entries[i].edx;
+ret = entry->edx;
 break;
 }
 }
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/15] i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid() function

2012-10-04 Thread Eduardo Habkost

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3519028..7115921 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -98,6 +98,19 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
 return cpuid;
 }
 
+/* Run KVM_GET_SUPPORTED_CPUID ioctl(), allocating a buffer large enough
+ * for all entries.
+ */
+static struct kvm_cpuid2 *get_supported_cpuid(KVMState *s)
+{
+struct kvm_cpuid2 *cpuid;
+int max = 1;
+while ((cpuid = try_get_cpuid(s, max)) == NULL) {
+max *= 2;
+}
+return cpuid;
+}
+
 struct kvm_para_features {
 int cap;
 int feature;
@@ -166,15 +179,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
-int max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
 bool found = false;
 
-max = 1;
-while ((cpuid = try_get_cpuid(s, max)) == NULL) {
-max *= 2;
-}
+cpuid = get_supported_cpuid(s);
 
 struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
 if (entry) {
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/15] i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too

2012-10-04 Thread Eduardo Habkost

Now that CPUID leaf 7 features can be enabled/disabled on the
command-line, we need to filter them properly using GET_SUPPORTED_CPUID,
at the same place where other features are filtered out.

Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 7921b1b..d8cd40a 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1376,6 +1376,8 @@ static void filter_features_for_kvm(X86CPU *cpu)
 kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
 env->cpuid_svm_features  &=
 kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
+env->cpuid_7_0_ebx_features &=
+kvm_arch_get_supported_cpuid(s, 7, 0, R_EBX);
 env->cpuid_kvm_features &=
 kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
 env->cpuid_ext4_features &=
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/15] i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features check

2012-10-04 Thread Eduardo Habkost

Instead of a function-specific has_kvm_features variable, simply use a
"found" variable that will be checked in case we have to use the legacy
get_para_features() interface.

No behavior change, just code cleanup.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8b4ab34..9ebde66 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -130,7 +130,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 int i, max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
-int has_kvm_features = 0;
+bool found = false;
 
 max = 1;
 while ((cpuid = try_get_cpuid(s, max)) == NULL) {
@@ -140,9 +140,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 for (i = 0; i < cpuid->nent; ++i) {
 if (cpuid->entries[i].function == function &&
 cpuid->entries[i].index == index) {
-if (cpuid->entries[i].function == KVM_CPUID_FEATURES) {
-has_kvm_features = 1;
-}
+found = true;
 switch (reg) {
 case R_EAX:
 ret = cpuid->entries[i].eax;
@@ -181,7 +179,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 g_free(cpuid);
 
 /* fallback for older kernels */
-if (!has_kvm_features && (function == KVM_CPUID_FEATURES)) {
+if ((function == KVM_CPUID_FEATURES) && !found) {
 ret = get_para_features(s);
 }
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/15] i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of for loop

2012-10-04 Thread Eduardo Habkost

The for loop will become a separate function, so clean it up so it can
become independent from the bit hacking for R_EDX.

No behavior change[1], just code movement.

[1] Well, only if the kernel returned CPUID leafs 1 or 0x8001 as
unsupported, but there's no kernel version that does that.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5b18383..8b4ab34 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -155,24 +155,29 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 break;
 case R_EDX:
 ret = cpuid->entries[i].edx;
-switch (function) {
-case 1:
-/* KVM before 2.6.30 misreports the following features */
-ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
-break;
-case 0x8001:
-/* On Intel, kvm returns cpuid according to the Intel spec,
- * so add missing bits according to the AMD spec:
- */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
-break;
-}
 break;
 }
 }
 }
 
+/* Fixups for the data returned by KVM, below */
+
+if (reg == R_EDX) {
+switch (function) {
+case 1:
+/* KVM before 2.6.30 misreports the following features */
+ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+break;
+case 0x8001:
+/* On Intel, kvm returns cpuid according to the Intel spec,
+ * so add missing bits according to the AMD spec:
+ */
+cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+break;
+}
+}
+
 g_free(cpuid);
 
 /* fallback for older kernels */
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/15] QEMU KVM_GET_SUPPORTED_CPUID cleanups and fixes

2012-10-04 Thread Eduardo Habkost

Most of this series are just cleanups that will help when making -cpu
check/enforce work properly, with some fixes.

In addition to code movements, the main changes are:
 - x2apic won't be enabled if in-kernel irqchip is disabled
   (patch 10)
 - CPUID feature bit filtering is done much earlier, and inside 
target-i386/cpu.c
   (patch 13)
 - CPUID leaf 7 feature bits are now filterd based on GET_SUPPORTED_CPUID too
   (patch 15)

Eduardo Habkost (15):
  i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of
for loop
  i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features
check
  i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable
  i386: kvm: extract register switch to cpuid_entry_get_reg() function
  i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function
  i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid()
function
  i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with
single 'if'
  i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()
  i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on
kvm_arch_get_supported_cpuid()
  i386: kvm: x2apic is not supported without in-kernel irqchip
  i386: kvm: mask cpuid_kvm_features earlier
  i386: kvm: mask cpuid_ext4_features bits earlier
  i386: kvm: filter CPUID feature words earlier, on cpu.c
  i386: kvm: reformat filter_features_for_kvm() code
  i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too

 kvm.h |   1 +
 target-i386/cpu.c |  30 +++
 target-i386/kvm.c | 153 --
 3 files changed, 122 insertions(+), 62 deletions(-)

-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Proposal for virtio standardization.

2012-10-04 Thread Anthony Liguori

Rusty Russell  writes:

> Hi all,
>
>   I've had several requests for a more formal approach to the
> virtio draft spec, and (after some soul-searching) I'd like to try that.
>
>   The proposal is to use OASIS as the standards body, as it's
> fairly light-weight as these things go.  For me this means paperwork and
> setting up a Working Group and getting the right people involved as
> Voting members starting with the current contributors; for most of you
> it just means a new mailing list, though I'll be cross-posting any
> drafts and major changes here anyway.
>
>   I believe that a documented standard (aka virtio 1.0) will
> increase visibility and adoption in areas outside our normal linux/kvm
> universe.  There's been some of that already, but this is the clearest
> path to accelerate it.  Not the easiest path, but I believe that a solid
> I/O standard is a Good Thing for everyone.
>
>   Yet I also want to decouple new and experimental development
> from the standards effort; running code comes first.  New feature bits
> and new device numbers should be reservable without requiring a full
> spec change.
>
> So the essence of my proposal is:
> 1) I start a Working Group within OASIS where we can aim for virtio spec
>1.0.
>
> 2) The current spec is textually reordered so the core is clearly
>bus-independent, with PCI, mmio, etc appendices.
>
> 3) Various clarifications, formalizations and cleanups to the spec text,
>and possibly elimination of old deprecated features.
>
> 4) The only significant change to the spec is that we use PCI
>capabilities, so we can have infinite feature bits.
>(see 
> http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)

We discussed this on IRC last night.  I don't think PCI capabilites are
a good mechanism to use...

PCI capabilities are there to organize how the PCI config space is
allocated to allow vendor extensions to co-exist with future PCI
extensions.

But we've never used the PCI config space within virtio-pci.  We do
everything in BAR0.  I don't think there's any real advantage of using
the config space vs. a BAR for virtio-pci.

The nice thing about using a BAR is that the region is MMIO or PIO.
This maps really nicely to non-PCI transports too.  But extending the
PCI config space (especially dealing with capability allocation) is
pretty gnarly and there isn't an obvious equivalent outside of PCI.

There are very devices that we emulate today that make use of extended
PCI device registers outside the platform devices (that have no BARs).

Regards,

Anthony Liguori

>
> 5) Changes to the ring layout and other such things are deferred to a
>future virtio version; whether this is done within OASIS or
>externally depends on how well this works for the 1.0 release.
>
> Thoughts?
> Rusty.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Anthony Liguori

Marcelo Tosatti  writes:

> On Thu, Oct 04, 2012 at 04:30:26PM +0200, Jan Kiszka wrote:
>> On 2012-10-04 16:21, Anthony Liguori wrote:
>> > -no-kvm should be included too.
>> 
>> Reminds me that we still need to agree on the final default accel strategy.
>
> Default accel=kvm for x86_64 targets, no fallback.

Ack.

BTW, if you want to override this, you can just do:

[machine]
accel=kvm:tcg

In /etc/qemu/target-i386.conf

So the distros can do whatever they like in their packages.

We should instruct users how to do this in an error message too in case
they want to permanently disable kvm because they don't have appropriate
hardware suport.

Regards,

Anthony Liguori

> Markus's argument is pretty convincing to me:
>
> "Friendly perhaps, generating an infinite series of questions "why is my
> guest slow as molasses?" certainly.
>
> And for each instance of the question, there's an unknown number of
> users who give QEMU a quick try, screw up KVM unknowingly, observe the
> glacial speed, and conclude it's crap."
>
>> > I just ran across a user that was injecting '-no-kvm-irqchip' in their
>> > libvirt XML via a custom attribute.  It turned out it was to work around
>> > broken MSI support in their funky guest they were running.  It was the
>> > wrong solution to the problem but they were doing it regardless.
>> > 
>> > The point is, there are users in the wild using these options.  There's
>> > no reason to remove them if they are trivial to maintain (and they are
>> > in their current form).
>> 
>> So let's define a consistent policy for them all:
>>  - warn on the command line on use
>>  - avoid adding them to the help or other user documentation
>>  - keep them until we rework the whole command line
>> 
>> Jan
>> 
>> -- 
>> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
>> Corporate Competence Center Embedded Linux
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove "boot=on|off" drive parameter compatibility

2012-10-04 Thread Jan Kiszka

On 2012-10-04 19:21, Lucas Meneghel Rodrigues wrote:
> On 10/04/2012 09:27 AM, Jan Kiszka wrote:
>> On 2012-10-04 14:10, Lucas Meneghel Rodrigues wrote:
>>> On 10/04/2012 07:48 AM, Jan Kiszka wrote:
 On 2012-10-03 15:19, Paolo Bonzini wrote:
> Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:
>> Yep, I did send patches with the testdev device present on qemu-kvm.git
>> to qemu.git a while ago, but there were many comments on the review, I
>> ended up not implementing everything that was asked and the patches were
>> archived.
>>
>> If nobody wants to step up to port it, I'll re-read the original thread
>> and will spin up new patches (and try to go through the end with it).
>> Executing the KVM unittests is something that we can't afford to lose,
>> so I'd say it's important on this last mile effort to get rid of 
>> qemu-kvm.
>
> Absolutely, IIRC the problem was that testdev did a little bit of
> everything... let's see what's the functionality of testdev:
>
> - write (port 0xf1), can be replaced in autotest with:
> -device isa-debugcon,iobase=0xf1,chardev=...
>
> - exit code (port 0xf4), see this series:
> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html
>
> - ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
> is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:
>
>   u32 rs = ((inb_cmos(0x34) << 16) | (inb_cmos(0x35) << 24));
>   if (rs)
>   rs += 16 * 1024 * 1024;
>   else
>   rs = (((inb_cmos(0x30) << 10) | (inb_cmos(0x31) << 18))
> + 1 * 1024 * 1024);
>
> The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.

 IIRC, one of the biggest problem with testdev was its hack to inject
 interrupts.
>>>
>>> Jan, I assume this commit helps to fix this, right?
>>>
>>> commit b334ec567f1de9a60349991e7b75083d569ddb0a
>>> Author: Jan Kiszka 
>>> Date:   Fri Mar 2 10:30:47 2012 +0100
>>>
>>>   qemu-kvm: Use upstream kvm-i8259
>>>
>>>   Drop the qemu-kvm version in favor of the equivalent upstream
>>>   implementation. This allows to move the i8259 back into the hwlib.
>>>
>>>   Note that this also drops the testdev hack and restores proper
>>>   isa_get_irq. If testdev scripts exist that inject > IRQ15, they need
>>>   fixing. Testing for these interrupts on the PIIX3 makes no practical
>>>   sense anyway as those lines are unused.
>>>
>>>   Signed-off-by: Jan Kiszka 
>>>   Signed-off-by: Avi Kivity 
>>
>> Yes, this improved it a lot as we no longer depend on additional
>> changes. I'm not sure if there was resistance beyond that.
>>
>> When cleaning up the code: register_ioport_read must be replaced with
>> the memory API.
> 
> I did look at the MemoryRegionOps/memory_region_init_io and still did 
> not figure out how to port things. I'll send a v2 addressing all the 
> comments made so far but this one, just to see if people are OK with the 
> direction of the full patch, then if you could give me some pointers of 
> how to do this conversion, it'd be great.
> 

See e.g. http://thread.gmane.org/gmane.comp.emulators.qemu/171217

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove "boot=on|off" drive parameter compatibility

2012-10-04 Thread Lucas Meneghel Rodrigues


On 10/04/2012 09:27 AM, Jan Kiszka wrote:

On 2012-10-04 14:10, Lucas Meneghel Rodrigues wrote:

On 10/04/2012 07:48 AM, Jan Kiszka wrote:

On 2012-10-03 15:19, Paolo Bonzini wrote:

Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:

Yep, I did send patches with the testdev device present on qemu-kvm.git
to qemu.git a while ago, but there were many comments on the review, I
ended up not implementing everything that was asked and the patches were
archived.

If nobody wants to step up to port it, I'll re-read the original thread
and will spin up new patches (and try to go through the end with it).
Executing the KVM unittests is something that we can't afford to lose,
so I'd say it's important on this last mile effort to get rid of qemu-kvm.


Absolutely, IIRC the problem was that testdev did a little bit of
everything... let's see what's the functionality of testdev:

- write (port 0xf1), can be replaced in autotest with:
-device isa-debugcon,iobase=0xf1,chardev=...

- exit code (port 0xf4), see this series:
http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html

- ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:

  u32 rs = ((inb_cmos(0x34) << 16) | (inb_cmos(0x35) << 24));
  if (rs)
  rs += 16 * 1024 * 1024;
  else
  rs = (((inb_cmos(0x30) << 10) | (inb_cmos(0x31) << 18))
+ 1 * 1024 * 1024);

The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.


IIRC, one of the biggest problem with testdev was its hack to inject
interrupts.


Jan, I assume this commit helps to fix this, right?

commit b334ec567f1de9a60349991e7b75083d569ddb0a
Author: Jan Kiszka 
Date:   Fri Mar 2 10:30:47 2012 +0100

  qemu-kvm: Use upstream kvm-i8259

  Drop the qemu-kvm version in favor of the equivalent upstream
  implementation. This allows to move the i8259 back into the hwlib.

  Note that this also drops the testdev hack and restores proper
  isa_get_irq. If testdev scripts exist that inject > IRQ15, they need
  fixing. Testing for these interrupts on the PIIX3 makes no practical
  sense anyway as those lines are unused.

  Signed-off-by: Jan Kiszka 
  Signed-off-by: Avi Kivity 


Yes, this improved it a lot as we no longer depend on additional
changes. I'm not sure if there was resistance beyond that.

When cleaning up the code: register_ioport_read must be replaced with
the memory API.


I did look at the MemoryRegionOps/memory_region_init_io and still did 
not figure out how to port things. I'll send a v2 addressing all the 
comments made so far but this one, just to see if people are OK with the 
direction of the full patch, then if you could give me some pointers of 
how to do this conversion, it'd be great.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Marcelo Tosatti

On Thu, Oct 04, 2012 at 04:30:26PM +0200, Jan Kiszka wrote:
> On 2012-10-04 16:21, Anthony Liguori wrote:
> > -no-kvm should be included too.
> 
> Reminds me that we still need to agree on the final default accel strategy.

Default accel=kvm for x86_64 targets, no fallback.

Markus's argument is pretty convincing to me:

"Friendly perhaps, generating an infinite series of questions "why is my
guest slow as molasses?" certainly.

And for each instance of the question, there's an unknown number of
users who give QEMU a quick try, screw up KVM unknowingly, observe the
glacial speed, and conclude it's crap."

> > I just ran across a user that was injecting '-no-kvm-irqchip' in their
> > libvirt XML via a custom attribute.  It turned out it was to work around
> > broken MSI support in their funky guest they were running.  It was the
> > wrong solution to the problem but they were doing it regardless.
> > 
> > The point is, there are users in the wild using these options.  There's
> > no reason to remove them if they are trivial to maintain (and they are
> > in their current form).
> 
> So let's define a consistent policy for them all:
>  - warn on the command line on use
>  - avoid adding them to the help or other user documentation
>  - keep them until we rework the whole command line
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Marcelo Tosatti

On Thu, Oct 04, 2012 at 05:36:38PM +0200, Andreas Färber wrote:
> Am 04.10.2012 16:30, schrieb Jan Kiszka:
> > On 2012-10-04 16:21, Anthony Liguori wrote:
> >> -no-kvm should be included too.
> > 
> > Reminds me that we still need to agree on the final default accel strategy.
> > 
> >>
> >> I just ran across a user that was injecting '-no-kvm-irqchip' in their
> >> libvirt XML via a custom attribute.  It turned out it was to work around
> >> broken MSI support in their funky guest they were running.  It was the
> >> wrong solution to the problem but they were doing it regardless.
> >>
> >> The point is, there are users in the wild using these options.  There's
> >> no reason to remove them if they are trivial to maintain (and they are
> >> in their current form).
> > 
> > So let's define a consistent policy for them all:
> >  - warn on the command line on use
> 
> >  - avoid adding them to the help or other user documentation
> 
> That's dangerous - at some point someone will notice and propose a patch
> documenting them and the reviewers may have forgotten by then why it was
> not documented in the first place. Better clearly document them in help
> output as "DEPRECATED, to be removed in future versions" or so.

Adding the policy to docs/qemukvm-compat-commands-policy.txt should workaround
that problem.

And a friendly text on the announce e-mail.

> Andreas
> 
> >  - keep them until we rework the whole command line
> > 
> > Jan
> > 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] vfio: Fix PCI INTx disable consistency

2012-10-04 Thread Alex Williamson

The virq_disabled flag tracks the userspace view of INTx masking
across interrupt mode changes, but we're not consistently applying
this to the interrupt and masking handler notion of the device.
Currently if the user sets DisINTx while in MSI or MSIX mode, then
returns to INTx mode (ex. rebooting a qemu guest), the hardware has
DisINTx+, but the management of INTx thinks it's enabled, making it
impossible to actually clear DisINTx.  Fix this by updating the
handler state when INTx is re-enabled.

Cc: sta...@vger.kernel.org
Signed-off-by: Alex Williamson 
---

 drivers/vfio/pci/vfio_pci_intrs.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index c8139a5..3639371 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -366,6 +366,17 @@ static int vfio_intx_enable(struct vfio_pci_device *vdev)
return -ENOMEM;
 
vdev->num_ctx = 1;
+
+   /*
+* If the virtual interrupt is masked, restore it.  Devices
+* supporting DisINTx can be masked at the hardware level
+* here, non-PCI-2.3 devices will have to wait until the
+* interrupt is enabled.
+*/
+   vdev->ctx[0].masked = vdev->virq_disabled;
+   if (vdev->pci_2_3)
+   pci_intx(vdev->pdev, !vdev->ctx[0].masked);
+
vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
return 0;
@@ -419,7 +430,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device 
*vdev, int fd)
 * disable_irq won't.
 */
spin_lock_irqsave(&vdev->irqlock, flags);
-   if (!vdev->pci_2_3 && (vdev->ctx[0].masked || vdev->virq_disabled))
+   if (!vdev->pci_2_3 && vdev->ctx[0].masked)
disable_irq_nosync(pdev->irq);
spin_unlock_irqrestore(&vdev->irqlock, flags);
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] vfio: Move PCI INTx eventfd setting earlier

2012-10-04 Thread Alex Williamson

We need to be ready to recieve an interrupt as soon as we call
request_irq, so our eventfd context setting needs to be moved
earlier.  Without this, an interrupt from our device or one
sharing the interrupt line can pass a NULL into eventfd_signal
and oops.

Cc: sta...@vger.kernel.org
Signed-off-by: Alex Williamson 
---

 drivers/vfio/pci/vfio_pci_intrs.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index d8dedc7..c8139a5 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -400,19 +400,20 @@ static int vfio_intx_set_signal(struct vfio_pci_device 
*vdev, int fd)
return PTR_ERR(trigger);
}
 
+   vdev->ctx[0].trigger = trigger;
+
if (!vdev->pci_2_3)
irqflags = 0;
 
ret = request_irq(pdev->irq, vfio_intx_handler,
  irqflags, vdev->ctx[0].name, vdev);
if (ret) {
+   vdev->ctx[0].trigger = NULL;
kfree(vdev->ctx[0].name);
eventfd_ctx_put(trigger);
return ret;
}
 
-   vdev->ctx[0].trigger = trigger;
-
/*
 * INTx disable will stick across the new irq setup,
 * disable_irq won't.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] VFIO PCI INTx fixes

2012-10-04 Thread Alex Williamson

These patches are now available in my next tree to fix a couple
issues with PCI INTx.  Thanks,

Alex

---

Alex Williamson (2):
  vfio: Fix PCI INTx disable consistency
  vfio: Move PCI INTx eventfd setting earlier


 drivers/vfio/pci/vfio_pci_intrs.c |   18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-04 18:13, Andreas Färber wrote:
> Am 04.10.2012 18:04, schrieb Jan Kiszka:
>> On 2012-10-04 17:36, Andreas Färber wrote:
>>> Am 04.10.2012 16:30, schrieb Jan Kiszka:
 On 2012-10-04 16:21, Anthony Liguori wrote:
  - avoid adding them to the help or other user documentation
>>>
>>> That's dangerous - at some point someone will notice and propose a patch
>>> documenting them and the reviewers may have forgotten by then why it was
>>> not documented in the first place. Better clearly document them in help
>>> output as "DEPRECATED, to be removed in future versions" or so.
>>
>> -M is marked as deprecated in the file that you would have to mess up.
> 
> If it's not within +/- 3 lines and goes through qemu-trivial then that
> point is moot. Don't see how deprecation of -no-kvm-irqchip et al. is
> related to -M being deprecated either way.

I'm only citing an example that these switches should follow:

[qemu-options.hx]
HXCOMM Deprecated by -machine
DEF("M", HAS_ARG, QEMU_OPTION_M, "", QEMU_ARCH_ALL)

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Alex Williamson

On Thu, 2012-10-04 at 18:14 +0200, Roedel, Joerg wrote:
> On Thu, Oct 04, 2012 at 09:26:39AM -0600, Alex Williamson wrote:
> > On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
> > > Hello Alex,
> > > 
> > > I just tested vfio as part of linux 3.6 and detected, that it doesn't
> > > work because of the following missing patch:
> > > 
> > > http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pci&article=16422
> > > 
> > > Could you please get it applied for the next stable release?
> > 
> > Hi Andreas,
> > 
> > This patch needs to go through Bjorn's PCI tree, but as noted in the
> > comments I have some outstanding questions that need to be answered,
> > probably by Joerg.  Thanks,
> 
> What are the open questions? I have confirmation from the hardware
> people that the south bridges are peer-2-peer safe. Other questions?

Hi Joerg,

There are a couple questions in the link above.  Since the devices don't
expose a PCIe capability, we probably need to add a check to look at the
upstream device and verify we're not on a legacy bus where ACS can't be
enforced.  Then there's the general question of whether the confirmation
of no peer-to-peer applies to every case where we might see this device
(some of them seem to have history that pre-dates this specific package
implementation) or do we need to try to identify specific package
properties in addition to just a device ID?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Roedel, Joerg

On Thu, Oct 04, 2012 at 09:26:39AM -0600, Alex Williamson wrote:
> On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
> > Hello Alex,
> > 
> > I just tested vfio as part of linux 3.6 and detected, that it doesn't
> > work because of the following missing patch:
> > 
> > http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pci&article=16422
> > 
> > Could you please get it applied for the next stable release?
> 
> Hi Andreas,
> 
> This patch needs to go through Bjorn's PCI tree, but as noted in the
> comments I have some outstanding questions that need to be answered,
> probably by Joerg.  Thanks,

What are the open questions? I have confirmation from the hardware
people that the south bridges are peer-2-peer safe. Other questions?


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Andreas Färber

Am 04.10.2012 18:04, schrieb Jan Kiszka:
> On 2012-10-04 17:36, Andreas Färber wrote:
>> Am 04.10.2012 16:30, schrieb Jan Kiszka:
>>> On 2012-10-04 16:21, Anthony Liguori wrote:
>>>  - avoid adding them to the help or other user documentation
>>
>> That's dangerous - at some point someone will notice and propose a patch
>> documenting them and the reviewers may have forgotten by then why it was
>> not documented in the first place. Better clearly document them in help
>> output as "DEPRECATED, to be removed in future versions" or so.
> 
> -M is marked as deprecated in the file that you would have to mess up.

If it's not within +/- 3 lines and goes through qemu-trivial then that
point is moot. Don't see how deprecation of -no-kvm-irqchip et al. is
related to -M being deprecated either way.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] KVM: x86: Skip request checking branches in vcpu_enter_guest() more effectively

2012-10-04 Thread Takuya Yoshikawa

On Mon, 24 Sep 2012 09:16:12 +0200
Gleb Natapov  wrote:

> Yes, for guests that do not enable steal time KVM_REQ_STEAL_UPDATE
> should be never set, but currently it is. The patch (not tested) should
> fix this.

Thinking a bit more about KVM_REQ_STEAL_UPDATE...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 901ad00..01572f5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1544,6 +1544,8 @@ static void accumulate_steal_time(struct kvm_vcpu *vcpu)
>   delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
>   vcpu->arch.st.last_steal = current->sched_info.run_delay;
>   vcpu->arch.st.accum_steal = delta;
> +
> + kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
>  }

Even without this flag, we know when we should record the steal time:
that's when vcpu->arch.st.accum_steal != 0.

>  
>  static void record_steal_time(struct kvm_vcpu *vcpu)

So we can live without KVM_REQ_STEAL_UPDATE and make record_steal_time()
do the job when vcpu->arch.st.accum_steal != 0 so that it can be called
unconditionally in vcpu_enter_guest().

Why didn't we do that?  The only reason I can think of is we wanted to
eliminate that check for the zero-requests case.

But does such a micro-optimization really help us?

Thanks,
Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-04 17:36, Andreas Färber wrote:
> Am 04.10.2012 16:30, schrieb Jan Kiszka:
>> On 2012-10-04 16:21, Anthony Liguori wrote:
>>> -no-kvm should be included too.
>>
>> Reminds me that we still need to agree on the final default accel strategy.
>>
>>>
>>> I just ran across a user that was injecting '-no-kvm-irqchip' in their
>>> libvirt XML via a custom attribute.  It turned out it was to work around
>>> broken MSI support in their funky guest they were running.  It was the
>>> wrong solution to the problem but they were doing it regardless.
>>>
>>> The point is, there are users in the wild using these options.  There's
>>> no reason to remove them if they are trivial to maintain (and they are
>>> in their current form).
>>
>> So let's define a consistent policy for them all:
>>  - warn on the command line on use
> 
>>  - avoid adding them to the help or other user documentation
> 
> That's dangerous - at some point someone will notice and propose a patch
> documenting them and the reviewers may have forgotten by then why it was
> not documented in the first place. Better clearly document them in help
> output as "DEPRECATED, to be removed in future versions" or so.

-M is marked as deprecated in the file that you would have to mess up.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Andreas Hartmann

Alex Williamson wrote:
> On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
>> Hello Alex,
>> 
>> I just tested vfio as part of linux 3.6 and detected, that it
>> doesn't work because of the following missing patch:
>> 
>> http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pci&article=16422
>>
>>
>> 
Could you please get it applied for the next stable release?
> 
> Hi Andreas,
> 
> This patch needs to go through Bjorn's PCI tree, but as noted in
> the comments I have some outstanding questions that need to be
> answered, probably by Joerg.  Thanks,

Sorry Alex,
I missed your additional questions because this patch just works
pretty fine here with kernel 3.4.x.

But you're right, your questions should be resolved before freeing it
to the wild.

@Joerg:
Please, liberate me from compiling and patching each kernel myself to
get it working and give all the other users out there a chance to use
this pretty code and the IOMMU support of AMD's hardware :-) by
answering Alex' questions.

Thanks,
kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Alexander Graf


On 04.10.2012, at 17:19, Bhushan Bharat-R65777 wrote:

> 
> 
>>> -static int emulation_exit(struct kvm_run *run, struct kvm_vcpu
>>> *vcpu)
>>> +static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>> + int exit_nr)
>>> {
>>> enum emulation_result er;
>>> 
>>> +   if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) &&
>>> +vcpu->arch.last_inst == KVMPPC_INST_GUEST_GDB) {
>> 
>> This belongs into the normal emulation code path, behind the same
>> switch() that everything else goes through.
> 
> I am not sure I understood correctly. Below is the reason why I
> placed this
 code here.
> Instruction where software breakpoint is to be set is replaced by "ehpriv"
 instruction. On e500v2, this is not a valid instruction can causes
 program interrupt. On e500mc, "ehpriv" is a valid instruction. Both
 the exit path calls emulation_exit(), so we have placed the code in this
>> function.
> Do you want this code to be moved in program interrupt exit path for
> e500v2
 and BOOKE_INTERRUPT_HV_PRIV for e500mc?
 
 Ok, in this patch you do (basically):
 
 int emulation_exit()
 {
   if (inst == DEBUG_INST) {
   debug_stuff();
   return;
   }
 
   switch (inst) {
   case INST_A:
   foo();
   
   }
 }
>>> 
>>> Are not we doing something like this:
>>> int emulation_exit()
>>> {
>>>if (inst == DEBUG_INST) {
>>>debug_stuff();
>>>return;
>>>}
>>> 
>>>status = kvmppc_emulate_instruction()
>>>switch (status) {
>>>case FAIL:
>>>foo();
>>>case DONE:
>>> foo1();
>>>
>>>}
>>> }
>>> 
>>> Do you want something like this:
>>> 
>>> int emulation_exit()
>>> {
>>> 
>>>status = kvmppc_emulate_instruction()
>>>switch (status) {
>>>case FAIL:
>>> if (inst == DEBUG_INST) {
>>> debug_stuff();
>>>   return;
>>> }
>>>foo();
>>> 
>>>case DONE:
>>> foo1();
>>>
>>>}
>>> }
>> 
>> No, I want the DEBUG_INST be handled the same as any other instruction we
>> emulate.
> 
> I would like to understand what you are thinking:
> What I derived is , add the instruction in kvmppc_emulate_instruction() (or 
> its child function) which, 
> 1) fill the relevant information in run-> , kvmppc_account_exit(vcpu, 
> DEBUG_EXITS); and returns EMULATION_DONE
> And in emulation_exit()
> status = kvmppc_emulate_instruction()
> switch (status) {
>   case EMULATION_DONE:
>   if (inst == DEBUG)
>   return RESUME_HOST;
> }
> Or
> 2) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns EMULATION_DONE;
> And in emulation_exit()
> status = kvmppc_emulate_instruction()
> switch (status) {
>   case EMULATION_DONE:
>   if (inst == DEBUG) {
>   fill run-> 
>   return RESUME_HOST;
>   }
> }
> 
> Or
> 3) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns a new status type 
> (EMULATION_DEBUG_INST)
> And in emulation_exit()
> status = kvmppc_emulate_instruction()
> switch (status) {
>   case EMULATION_DEBUG_INST:
>   fill run-> 
>   return RESUME_HOST;
> }

This one :).

> 
>> 
 
 what I want is:
 
 int emulation_exit()
 {
   switch (inst) {
   case INST_A:
   foo(); break;
   case DEBUG_INST:
   debug_stuff(); break;
   
   }
 }
 
> 
> 
>> 
>>> +   run->exit_reason = KVM_EXIT_DEBUG;
>>> +   run->debug.arch.pc = vcpu->arch.pc;
>>> +   run->debug.arch.exception = exit_nr;
>>> +   run->debug.arch.status = 0;
>>> +   kvmppc_account_exit(vcpu, DEBUG_EXITS);
>>> +   return RESUME_HOST;
>>> +   }
>>> +
>>> er = kvmppc_emulate_instruction(run, vcpu);
>>> switch (er) {
>>> case EMULATE_DONE:
>>> @@ -697,6 +711,44 @@ static int emulation_exit(struct kvm_run
>>> *run, struct
>> kvm_vcpu *vcpu)
>>> default:
>>> BUG();
>>> }
>>> +
>>> +   if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_ENABLE) &&
>>> +   (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
>> 
>> I don't understand how this is supposed to work. When we enable
>> singlestep, why would we end up in emulation_exit()?
> 
> When singlestep is enabled then we set DBCR0[ICMP] and the debug
> handler
 should be able to handle this. I think you are right.
> 
>> 
>>> +   run->exit_reason = KVM_EXIT_DEBUG;
>>> +   return RES

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Andreas Färber

Am 04.10.2012 16:30, schrieb Jan Kiszka:
> On 2012-10-04 16:21, Anthony Liguori wrote:
>> -no-kvm should be included too.
> 
> Reminds me that we still need to agree on the final default accel strategy.
> 
>>
>> I just ran across a user that was injecting '-no-kvm-irqchip' in their
>> libvirt XML via a custom attribute.  It turned out it was to work around
>> broken MSI support in their funky guest they were running.  It was the
>> wrong solution to the problem but they were doing it regardless.
>>
>> The point is, there are users in the wild using these options.  There's
>> no reason to remove them if they are trivial to maintain (and they are
>> in their current form).
> 
> So let's define a consistent policy for them all:
>  - warn on the command line on use

>  - avoid adding them to the help or other user documentation

That's dangerous - at some point someone will notice and propose a patch
documenting them and the reviewers may have forgotten by then why it was
not documented in the first place. Better clearly document them in help
output as "DEPRECATED, to be removed in future versions" or so.

Andreas

>  - keep them until we rework the whole command line
> 
> Jan
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Alex Williamson

On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
> Hello Alex,
> 
> I just tested vfio as part of linux 3.6 and detected, that it doesn't
> work because of the following missing patch:
> 
> http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pci&article=16422
> 
> Could you please get it applied for the next stable release?

Hi Andreas,

This patch needs to go through Bjorn's PCI tree, but as noted in the
comments I have some outstanding questions that need to be answered,
probably by Joerg.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

vfio: missing patch in linux 3.6

2012-10-04 Thread Andreas Hartmann

Hello Alex,

I just tested vfio as part of linux 3.6 and detected, that it doesn't
work because of the following missing patch:

http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pci&article=16422

Could you please get it applied for the next stable release?


Thanks,
kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Bhushan Bharat-R65777



> > -static int emulation_exit(struct kvm_run *run, struct kvm_vcpu
> > *vcpu)
> > +static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
> > + int exit_nr)
> > {
> > enum emulation_result er;
> >
> > +   if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) &&
> > +vcpu->arch.last_inst == KVMPPC_INST_GUEST_GDB) {
> 
>  This belongs into the normal emulation code path, behind the same
>  switch() that everything else goes through.
> >>>
> >>> I am not sure I understood correctly. Below is the reason why I
> >>> placed this
> >> code here.
> >>> Instruction where software breakpoint is to be set is replaced by "ehpriv"
> >> instruction. On e500v2, this is not a valid instruction can causes
> >> program interrupt. On e500mc, "ehpriv" is a valid instruction. Both
> >> the exit path calls emulation_exit(), so we have placed the code in this
> function.
> >>> Do you want this code to be moved in program interrupt exit path for
> >>> e500v2
> >> and BOOKE_INTERRUPT_HV_PRIV for e500mc?
> >>
> >> Ok, in this patch you do (basically):
> >>
> >> int emulation_exit()
> >> {
> >>if (inst == DEBUG_INST) {
> >>debug_stuff();
> >>return;
> >>}
> >>
> >>switch (inst) {
> >>case INST_A:
> >>foo();
> >>
> >>}
> >> }
> >
> > Are not we doing something like this:
> > int emulation_exit()
> > {
> > if (inst == DEBUG_INST) {
> > debug_stuff();
> > return;
> > }
> >
> > status = kvmppc_emulate_instruction()
> > switch (status) {
> > case FAIL:
> > foo();
> > case DONE:
> > foo1();
> > 
> > }
> > }
> >
> > Do you want something like this:
> >
> > int emulation_exit()
> > {
> >
> > status = kvmppc_emulate_instruction()
> > switch (status) {
> > case FAIL:
> > if (inst == DEBUG_INST) {
> > debug_stuff();
> >   return;
> > }
> > foo();
> >
> > case DONE:
> > foo1();
> > 
> > }
> > }
> 
> No, I want the DEBUG_INST be handled the same as any other instruction we
> emulate.

I would like to understand what you are thinking:
What I derived is , add the instruction in kvmppc_emulate_instruction() (or its 
child function) which, 
1) fill the relevant information in run-> , kvmppc_account_exit(vcpu, 
DEBUG_EXITS); and returns EMULATION_DONE
 And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
case EMULATION_DONE:
if (inst == DEBUG)
return RESUME_HOST;
 }
 Or
2) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns EMULATION_DONE;
And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
case EMULATION_DONE:
if (inst == DEBUG) {
fill run-> 
return RESUME_HOST;
}
 }

Or
3) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns a new status type 
(EMULATION_DEBUG_INST)
And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
case EMULATION_DEBUG_INST:
fill run-> 
return RESUME_HOST;
 }

> 
> >>
> >> what I want is:
> >>
> >> int emulation_exit()
> >> {
> >>switch (inst) {
> >>case INST_A:
> >>foo(); break;
> >>case DEBUG_INST:
> >>debug_stuff(); break;
> >>
> >>}
> >> }
> >>
> >>>
> >>>
> 
> > +   run->exit_reason = KVM_EXIT_DEBUG;
> > +   run->debug.arch.pc = vcpu->arch.pc;
> > +   run->debug.arch.exception = exit_nr;
> > +   run->debug.arch.status = 0;
> > +   kvmppc_account_exit(vcpu, DEBUG_EXITS);
> > +   return RESUME_HOST;
> > +   }
> > +
> > er = kvmppc_emulate_instruction(run, vcpu);
> > switch (er) {
> > case EMULATE_DONE:
> > @@ -697,6 +711,44 @@ static int emulation_exit(struct kvm_run
> > *run, struct
>  kvm_vcpu *vcpu)
> > default:
> > BUG();
> > }
> > +
> > +   if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_ENABLE) &&
> > +   (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
> 
>  I don't understand how this is supposed to work. When we enable
>  singlestep, why would we end up in emulation_exit()?
> >>>
> >>> When singlestep is enabled then we set DBCR0[ICMP] and the debug
> >>> handler
> >> should be able to handle this. I think you are right.
> >>>
> 
> > +   run->exit_reason = KVM_EXIT_DEBUG;
> > +   return RESUME_HOST;
> > +   }
> > +}
> > +
> > +static int kvmppc_handle_debug(st

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Avi Kivity

On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
> On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
>> 
>> Again the numbers are ridiculously high for arch_local_irq_restore.
>> Maybe there's a bad perf/kvm interaction when we're injecting an
>> interrupt, I can't believe we're spending 84% of the time running the
>> popf instruction. 
> 
> Smells like a software fallback that doesn't do NMI, hrtimer based
> sampling typically hits popf where we re-enable interrupts.

Good nose, that's probably it.  Raghavendra, can you ensure that the PMU
is properly exposed?  'dmesg' in the guest will tell.  If it isn't, -cpu
host will expose it (and a good idea anyway to get best performance).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Andrew Theurer

On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
> On 10/04/2012 12:49 PM, Raghavendra K T wrote:
> > On 10/03/2012 10:35 PM, Avi Kivity wrote:
> >> On 10/03/2012 02:22 PM, Raghavendra K T wrote:
>  So I think it's worth trying again with ple_window of 2-4.
> 
> >>>
> >>> Hi Avi,
> >>>
> >>> I ran different benchmarks increasing ple_window, and results does not
> >>> seem to be encouraging for increasing ple_window.
> >>
> >> Thanks for testing! Comments below.
> >>
> >>> Results:
> >>> 16 core PLE machine with 16 vcpu guest.
> >>>
> >>> base kernel = 3.6-rc5 + ple handler optimization patch
> >>> base_pleopt_8k = base kernel + ple window = 8k
> >>> base_pleopt_16k = base kernel + ple window = 16k
> >>> base_pleopt_32k = base kernel + ple window = 32k
> >>>
> >>>
> >>> Percentage improvements of benchmarks w.r.t base_pleopt with
> >>> ple_window = 4096
> >>>
> >>> base_pleopt_8kbase_pleopt_16kbase_pleopt_32k
> >>> - 
> >>>   
> >>>
> >>> kernbench_1x-5.54915-15.94529-44.31562
> >>> kernbench_2x-7.89399-17.75039-37.73498
> >>
> >> So, 44% degradation even with no overcommit?  That's surprising.
> > 
> > Yes. Kernbench was run with #threads = #vcpu * 2 as usual. Is it
> > spending 8 times the original ple_window cycles for 16 vcpus
> > significant?
> 
> A PLE exit when not overcommitted cannot do any good, it is better to
> spin in the guest rather that look for candidates on the host.  In fact
> when we benchmark we often disable PLE completely.

Agreed.  However, I really do not understand why the kernbench regressed
with bigger ple_window.  It should stay the same or improve.  Raghu, do
you have perf data for the kernbench runs?
> 
> > 
> >>
> >>> I also got perf top output to analyse the difference. Difference comes
> >>> because of flushtlb (and also spinlock).
> >>
> >> That's in the guest, yes?
> > 
> > Yes. Perf is in guest.
> > 
> >>
> >>>
> >>> Ebizzy run for 4k ple_window
> >>> -  87.20%  [kernel]  [k] arch_local_irq_restore
> >>> - arch_local_irq_restore
> >>>- 100.00% _raw_spin_unlock_irqrestore
> >>>   + 52.89% release_pages
> >>>   + 47.10% pagevec_lru_move_fn
> >>> -   5.71%  [kernel]  [k] arch_local_irq_restore
> >>> - arch_local_irq_restore
> >>>+ 86.03% default_send_IPI_mask_allbutself_phys
> >>>+ 13.96% default_send_IPI_mask_sequence_phys
> >>> -   3.10%  [kernel]  [k] smp_call_function_many
> >>>   smp_call_function_many
> >>>
> >>>
> >>> Ebizzy run for 32k ple_window
> >>>
> >>> -  91.40%  [kernel]  [k] arch_local_irq_restore
> >>> - arch_local_irq_restore
> >>>- 100.00% _raw_spin_unlock_irqrestore
> >>>   + 53.13% release_pages
> >>>   + 46.86% pagevec_lru_move_fn
> >>> -   4.38%  [kernel]  [k] smp_call_function_many
> >>>   smp_call_function_many
> >>> -   2.51%  [kernel]  [k] arch_local_irq_restore
> >>> - arch_local_irq_restore
> >>>+ 90.76% default_send_IPI_mask_allbutself_phys
> >>>+ 9.24% default_send_IPI_mask_sequence_phys
> >>>
> >>
> >> Both the 4k and the 32k results are crazy.  Why is
> >> arch_local_irq_restore() so prominent?  Do you have a very high
> >> interrupt rate in the guest?
> > 
> > How to measure if I have high interrupt rate in guest?
> > From /proc/interrupt numbers I am not able to judge :(
> 
> 'vmstat 1'
> 
> > 
> > I went back and got the results on a 32 core machine with 32 vcpu guest.
> > Strangely, I got result supporting the claim that increasing ple_window
> > helps for non-overcommitted scenario.
> > 
> > 32 core 32 vcpu guest 1x scenarios.
> > 
> > ple_gap = 0
> > kernbench: Elapsed Time 38.61
> > ebizzy: 7463 records/s
> > 
> > ple_window = 4k
> > kernbench: Elapsed Time 43.5067
> > ebizzy:2528 records/s
> > 
> > ple_window = 32k
> > kernebench : Elapsed Time 39.4133
> > ebizzy: 7196 records/s
> 
> So maybe something was wrong with the first measurement.

OK, this is more in line with what I expected for kernbench.  FWIW, in
order to show an improvement for a larger ple_window, we really need a
workload which we know has a longer lock holding time (without factoring
in LHP).  We have noticed this on IO based locks mostly.  We saw it with
a massive disk IO test (qla2xxx lock), and also with a large web serving
test (some vfs related lock, but I forget what exactly it was).
> 
> > 
> > 
> > perf top for ebizzy for above:
> > ple_gap = 0
> > -  84.74%  [kernel]  [k] arch_local_irq_restore
> >- arch_local_irq_restore
> >   - 100.00% _raw_spin_unlock_irqrestore
> >  + 50.96% release_pages
> >  + 49.02% pagevec_lru_move_fn
> > -   6.57%  [kernel]  [k] arch_local_irq_restore
> >- arch_local_irq_restore
> >   + 92.54% default_send_IPI_mask_allbutself_phys
> >   + 7.46% default_send_IPI_mask_sequence_phys
> > -   1.54%  [kernel]  [k] smp_call_function_many
> >  smp_call_function_man

Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Alexander Graf


On 04.10.2012, at 16:22, Bhushan Bharat-R65777 wrote:

> 
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Thursday, October 04, 2012 4:56 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
>> Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
>> 
>> 
>> On 04.10.2012, at 13:06, Bhushan Bharat-R65777 wrote:
>> 
>>> 
>>> 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:50 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan
 Bharat-R65777
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 21.08.2012, at 15:52, Bharat Bhushan wrote:
 
> This patch adds the debug stub support on booke/bookehv.
> Now QEMU debug stub can use hw breakpoint, watchpoint and software
> breakpoint to debug guest.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> arch/powerpc/include/asm/kvm.h|   29 ++-
> arch/powerpc/include/asm/kvm_host.h   |5 +
> arch/powerpc/kernel/asm-offsets.c |   26 ++
> arch/powerpc/kvm/booke.c  |  144 
> +--
>> --
> arch/powerpc/kvm/booke_interrupts.S   |  110 +
> arch/powerpc/kvm/bookehv_interrupts.S |  141
>> +++-
> arch/powerpc/kvm/e500mc.c |3 +-
> 7 files changed, 435 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm.h
> b/arch/powerpc/include/asm/kvm.h index 61b197e..53479ea 100644
> --- a/arch/powerpc/include/asm/kvm.h
> +++ b/arch/powerpc/include/asm/kvm.h
> @@ -25,6 +25,7 @@
> /* Select powerpc specific features in  */ #define
> __KVM_HAVE_SPAPR_TCE #define __KVM_HAVE_PPC_SMT
> +#define __KVM_HAVE_GUEST_DEBUG
> 
> struct kvm_regs {
>   __u64 pc;
> @@ -264,7 +265,31 @@ struct kvm_fpu {
>   __u64 fpr[32];
> };
> 
> +
> +/*
> + * Defines for h/w breakpoint, watchpoint (read, write or both) and
> + * software breakpoint.
> + * These are used as "type" in KVM_SET_GUEST_DEBUG ioctl and "status"
> + * for KVM_DEBUG_EXIT.
> + */
> +#define KVMPPC_DEBUG_NONE0x0
> +#define KVMPPC_DEBUG_BREAKPOINT  (1UL << 1)
> +#define KVMPPC_DEBUG_WATCH_WRITE (1UL << 2)
> +#define KVMPPC_DEBUG_WATCH_READ  (1UL << 3)
> struct kvm_debug_exit_arch {
> + __u64 pc;
> + /*
> +  * exception -> returns the exception number. If the KVM_DEBUG_EXIT
> +  * exit is not handled (say not h/w breakpoint or software breakpoint
> +  * set for this address) by qemu then it is supposed to inject this
> +  * exception to guest.
> +  */
> + __u32 exception;
> + /*
> +  * exiting to userspace because of h/w breakpoint, watchpoint
> +  * (read, write or both) and software breakpoint.
> +  */
> + __u32 status;
> };
> 
> /* for KVM_SET_GUEST_DEBUG */
> @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
>* Type denotes h/w breakpoint, read watchpoint, write
>* watchpoint or watchpoint (both read and write).
>*/
> -#define KVMPPC_DEBUG_NOTYPE  0x0
> -#define KVMPPC_DEBUG_BREAKPOINT  (1UL << 1)
> -#define KVMPPC_DEBUG_WATCH_WRITE (1UL << 2)
> -#define KVMPPC_DEBUG_WATCH_READ  (1UL << 3)
>   __u32 type;
>   __u32 pad1;
>   __u64 pad2;
> diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
> index c7219c1..3ba465a 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
>   u32 mmucfg;
>   u32 epr;
>   u32 crit_save;
> + /* guest debug registers*/
>   struct kvmppc_booke_debug_reg dbg_reg;
> + /* shadow debug registers */
> + struct kvmppc_booke_debug_reg shadow_dbg_reg;
> + /* host debug registers*/
> + struct kvmppc_booke_debug_reg host_dbg_reg;
> #endif
>   gpa_t paddr_accessed;
>   gva_t vaddr_accessed;
> diff --git a/arch/powerpc/kernel/asm-offsets.c
> b/arch/powerpc/kernel/asm-
 offsets.c
> index 555448e..6987821 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -564,6 +564,32 @@ int main(void)
>   DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
>   DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
>   DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
> + DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
> + DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
> + DEFINE(VCPU_HOST_DBG, offs

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-04 16:21, Anthony Liguori wrote:
> -no-kvm should be included too.

Reminds me that we still need to agree on the final default accel strategy.

> 
> I just ran across a user that was injecting '-no-kvm-irqchip' in their
> libvirt XML via a custom attribute.  It turned out it was to work around
> broken MSI support in their funky guest they were running.  It was the
> wrong solution to the problem but they were doing it regardless.
> 
> The point is, there are users in the wild using these options.  There's
> no reason to remove them if they are trivial to maintain (and they are
> in their current form).

So let's define a consistent policy for them all:
 - warn on the command line on use
 - avoid adding them to the help or other user documentation
 - keep them until we rework the whole command line

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio problems

2012-10-04 Thread Alex Williamson

On Thu, 2012-10-04 at 11:40 +0200, Dominic Eschweiler wrote:
> Hi,
> 
> I just recently started to play with vfio, since the new Kernel 3.6
> comes directly with an integrated vfio-stack. My problem currently is,
> that I'm not able to bind the vfio-pci to the (unused) smbus controller
> in my laptop. 
> 
> Here are the steps I did and the related results:
> 
> 
> # modprobe vfio-pci
> 
> [ 1609.065705] VFIO - User Level meta-driver version: 0.3
> 
> # lspci
> 
> ...
> 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family
> SMBus Controller (rev 05)
> ...
> 
> # lspci -n -s :00:1f.3
> 
> 00:1f.3 0c05: 8086:1c22 (rev 05)
> 
> # echo 8086 1c22 > /sys/bus/pci/drivers/vfio-pci/new_id
> 
> [ 4485.759148] vfio-pci: probe of :00:1f.3 failed with error -22
> 
> 
> I omitted the unbind step, which is described in the documentation,
> since the device is not claimed by any driver at all. Also the "error
> -22" statement isn't really helping in this case. 
> 
> Any ideas?
> 

What does this report?

readlink /sys/bus/pci/devices/\:00\:1f.3/iommu_group

The likely cause of an EINVAL for an endpoint device is that it's not
part of an IOMMU group, which may mean you don't have an IOMMU enabled.
You can also look in /sys/kernel/iommu_groups to see the groups.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] KVM updates for the 3.7 merge window

2012-10-04 Thread Avi Kivity

Linus, please pull from the repo and tag at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.7-1

to receive the KVM updates for the 3.7 merge windows.  Highlights of the
changes for this release include support for vfio level triggered interrupts,
improved big real mode support on older Intels, a streamlines guest page table
walker, guest APIC speedups, PIO optimizations, better overcommit handling, and
read-only memory.

Shortlog and diffstat follow.


KVM updates for the 3.7 merge window


Alex Williamson (1):
  KVM: Add resampling irqfds for level triggered interrupts

Alexander Graf (1):
  KVM: Add ppc hypercall documentation

Avi Kivity (36):
  Merge branch 'queue' into next
  KVM: Don't update PPR on any APIC read
  KVM: Remove internal timer abstraction
  KVM: Simplify kvm_timer
  KVM: Simplify kvm_pit_timer
  KVM: fold kvm_pit_timer into kvm_kpit_state
  Merge remote-tracking branch 'upstream' into next
  KVM: VMX: Advertize RDTSC exiting to nested guests
  KVM: x86 emulator: access GPRs on demand
  KVM: VMX: Separate saving pre-realmode state from setting segments
  KVM: VMX: Fix incorrect lookup of segment S flag in fix_pmode_dataseg()
  KVM: VMX: Use kvm_segment to save protected-mode segments when entering 
realmode
  KVM: VMX: Retain limit and attributes when entering protected mode
  KVM: VMX: Allow real mode emulation using vm86 with dpl=0
  KVM: VMX: Allow vm86 virtualization of big real mode
  KVM: x86 emulator: Leave segment limit and attributs alone in real mode
  KVM: x86 emulator: Check segment limits in real mode too
  KVM: x86 emulator: Fix #GP error code during linearization
  KVM: VMX: Return real real-mode segment data even if 
emulate_invalid_guest_state=1
  KVM: VMX: Preserve segment limit and access rights in real mode
  KVM: VMX: Save all segment data in real mode
  KVM: VMX: Ignore segment G and D bits when considering whether we can 
virtualize
  KVM: VMX: Make lto-friendly
  KVM: VMX: Make use of asm.h
  KVM: SVM: Make use of asm.h
  KVM: MMU: Push clean gpte write protection out of gpte_access()
  KVM: MMU: Optimize gpte_access() slightly
  KVM: MMU: Move gpte_access() out of paging_tmpl.h
  KVM: MMU: Update accessed and dirty bits after guest pagetable walk
  KVM: MMU: Optimize pte permission checks
  KVM: MMU: Simplify walk_addr_generic() loop
  KVM: MMU: Optimize is_last_gpte()
  KVM: MMU: Eliminate eperm temporary
  KVM: MMU: Avoid access/dirty update loop if all is well
  KVM: MMU: Eliminate pointless temporary 'ac'
  Merge branch 'queue' into next

Christian Borntraeger (1):
  KVM: s390: Fix vcpu_load handling in interrupt code

Christoffer Dall (1):
  KVM: Move KVM_IRQ_LINE to arch-generic code

Cornelia Huck (3):
  s390/dis: Instruction decoding interface
  KVM: s390: Add architectural trace events
  KVM: s390: Add implementation-specific trace events

Florian Westphal (1):
  KVM guest: disable stealtime on reboot to avoid mem corruption

Gavin Shan (1):
  KVM: PPC: book3s: fix build error caused by gfn_to_hva_memslot()

Gleb Natapov (18):
  KVM: x86 emulator: drop unneeded call to get_segment()
  KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct value
  KVM: clean up kvm_(set|get)_apic_base
  KVM: use kvm_lapic_set_base() to change apic_base
  KVM: mark apic enabled on start up
  jump_label: Export jump_label_rate_limit()
  KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE 
MSR
  KVM: use jump label to optimize checking for SW enabled apic in spurious 
interrupt register
  KVM: use jump label to optimize checking for in kernel local apic presence
  KVM: inline kvm_apic_present() and kvm_lapic_enabled()
  KVM: correctly detect APIC SW state in kvm_apic_post_state_restore()
  KVM: VMX: restore MSR_IA32_DEBUGCTLMSR after VMEXIT
  KVM: cleanup pic reset
  KVM: Provide userspace IO exit completion callback
  KVM: emulator: make x86 emulation modes enum instead of defines
  KVM: emulator: string_addr_inc() cleanup
  KVM: emulator: optimize "rep ins" handling
  KVM: optimize apic interrupt delivery

Guo Chao (7):
  KVM: VMX: Fix typos
  KVM: SVM: Fix typos
  KVM: x86: Fix typos in x86.c
  KVM: x86: Fix typos in emulate.c
  KVM: x86: Fix typos in cpuid.c
  KVM: x86: Fix typos in lapic.c
  KVM: x86: Fix typos in pmu.c

Jan Kiszka (2):
  KVM: Improve wording of KVM_SET_USER_MEMORY_REGION documentation
  KVM: x86: Fix guest debug across vcpu INIT reset

Liu, Jinsong (1):
  KVM: Depend on HIGH_RES_TIMERS

Marcelo Tosatti (7):
  KVM: x86: fix pvclock guest stopped flag reporting
  x86: KVM guest: merge CONFIG_KVM_CLOCK int

RE: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Bhushan Bharat-R65777



> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, October 04, 2012 4:56 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
> 
> 
> On 04.10.2012, at 13:06, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Monday, September 24, 2012 9:50 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan
> >> Bharat-R65777
> >> Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
> >>
> >>
> >> On 21.08.2012, at 15:52, Bharat Bhushan wrote:
> >>
> >>> This patch adds the debug stub support on booke/bookehv.
> >>> Now QEMU debug stub can use hw breakpoint, watchpoint and software
> >>> breakpoint to debug guest.
> >>>
> >>> Signed-off-by: Bharat Bhushan 
> >>> ---
> >>> arch/powerpc/include/asm/kvm.h|   29 ++-
> >>> arch/powerpc/include/asm/kvm_host.h   |5 +
> >>> arch/powerpc/kernel/asm-offsets.c |   26 ++
> >>> arch/powerpc/kvm/booke.c  |  144 
> >>> +--
> --
> >>> arch/powerpc/kvm/booke_interrupts.S   |  110 +
> >>> arch/powerpc/kvm/bookehv_interrupts.S |  141
> +++-
> >>> arch/powerpc/kvm/e500mc.c |3 +-
> >>> 7 files changed, 435 insertions(+), 23 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/kvm.h
> >>> b/arch/powerpc/include/asm/kvm.h index 61b197e..53479ea 100644
> >>> --- a/arch/powerpc/include/asm/kvm.h
> >>> +++ b/arch/powerpc/include/asm/kvm.h
> >>> @@ -25,6 +25,7 @@
> >>> /* Select powerpc specific features in  */ #define
> >>> __KVM_HAVE_SPAPR_TCE #define __KVM_HAVE_PPC_SMT
> >>> +#define __KVM_HAVE_GUEST_DEBUG
> >>>
> >>> struct kvm_regs {
> >>>   __u64 pc;
> >>> @@ -264,7 +265,31 @@ struct kvm_fpu {
> >>>   __u64 fpr[32];
> >>> };
> >>>
> >>> +
> >>> +/*
> >>> + * Defines for h/w breakpoint, watchpoint (read, write or both) and
> >>> + * software breakpoint.
> >>> + * These are used as "type" in KVM_SET_GUEST_DEBUG ioctl and "status"
> >>> + * for KVM_DEBUG_EXIT.
> >>> + */
> >>> +#define KVMPPC_DEBUG_NONE0x0
> >>> +#define KVMPPC_DEBUG_BREAKPOINT  (1UL << 1)
> >>> +#define KVMPPC_DEBUG_WATCH_WRITE (1UL << 2)
> >>> +#define KVMPPC_DEBUG_WATCH_READ  (1UL << 3)
> >>> struct kvm_debug_exit_arch {
> >>> + __u64 pc;
> >>> + /*
> >>> +  * exception -> returns the exception number. If the KVM_DEBUG_EXIT
> >>> +  * exit is not handled (say not h/w breakpoint or software breakpoint
> >>> +  * set for this address) by qemu then it is supposed to inject this
> >>> +  * exception to guest.
> >>> +  */
> >>> + __u32 exception;
> >>> + /*
> >>> +  * exiting to userspace because of h/w breakpoint, watchpoint
> >>> +  * (read, write or both) and software breakpoint.
> >>> +  */
> >>> + __u32 status;
> >>> };
> >>>
> >>> /* for KVM_SET_GUEST_DEBUG */
> >>> @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
> >>>* Type denotes h/w breakpoint, read watchpoint, write
> >>>* watchpoint or watchpoint (both read and write).
> >>>*/
> >>> -#define KVMPPC_DEBUG_NOTYPE  0x0
> >>> -#define KVMPPC_DEBUG_BREAKPOINT  (1UL << 1)
> >>> -#define KVMPPC_DEBUG_WATCH_WRITE (1UL << 2)
> >>> -#define KVMPPC_DEBUG_WATCH_READ  (1UL << 3)
> >>>   __u32 type;
> >>>   __u32 pad1;
> >>>   __u64 pad2;
> >>> diff --git a/arch/powerpc/include/asm/kvm_host.h
> >> b/arch/powerpc/include/asm/kvm_host.h
> >>> index c7219c1..3ba465a 100644
> >>> --- a/arch/powerpc/include/asm/kvm_host.h
> >>> +++ b/arch/powerpc/include/asm/kvm_host.h
> >>> @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
> >>>   u32 mmucfg;
> >>>   u32 epr;
> >>>   u32 crit_save;
> >>> + /* guest debug registers*/
> >>>   struct kvmppc_booke_debug_reg dbg_reg;
> >>> + /* shadow debug registers */
> >>> + struct kvmppc_booke_debug_reg shadow_dbg_reg;
> >>> + /* host debug registers*/
> >>> + struct kvmppc_booke_debug_reg host_dbg_reg;
> >>> #endif
> >>>   gpa_t paddr_accessed;
> >>>   gva_t vaddr_accessed;
> >>> diff --git a/arch/powerpc/kernel/asm-offsets.c
> >>> b/arch/powerpc/kernel/asm-
> >> offsets.c
> >>> index 555448e..6987821 100644
> >>> --- a/arch/powerpc/kernel/asm-offsets.c
> >>> +++ b/arch/powerpc/kernel/asm-offsets.c
> >>> @@ -564,6 +564,32 @@ int main(void)
> >>>   DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
> >>>   DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
> >>>   DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
> >>> + DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
> >>> + DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
> >>> + DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
> >>> + DEFINE(KVMPPC_DBG_DBCR0, offset

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Anthony Liguori

Jan Kiszka  writes:

> On 2012-10-03 20:26, Marcelo Tosatti wrote:
>> On Wed, Oct 03, 2012 at 07:24:48PM +0200, Jan Kiszka wrote:
>>> On 2012-10-03 19:16, Anthony Liguori wrote:
 Jan Kiszka  writes:

> On 2012-10-03 17:03, Marcelo Tosatti wrote:
>> On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote:
>>> Marcelo Tosatti  writes:
>>>
 Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git.

 From: Jan Kiszka 
 
 Upstream is moving towards this mechanism, so start using it in 
 qemu-kvm
 already to configure the specific defaults: kvm enabled on, just like
 in-kernel irqchips.

 Signed-off-by: Marcelo Tosatti 
>>>
>>>
>>> Reviewed-by: Anthony Liguori 
>>>
>>> Although it's a little odd to have From: Jan without a SoB...
>>
>> Agree, Jan can you ACK?
>
> I wasn't able to join the call yesterday: Is there a removal schedule
> associated with those switches? Also, why pushing things upstream, even
> when only for one release, that have been loudly deprecated for a while
> in qemu-kvm? Some switches are lacking deprecated warnings on the
> console, and -no-kvm is missing completely. I tend to focus on patch 1 &
> 5, dropping the rest - based on relevance for production use.

 The distros need to keep these flags to do the switch.
>>>
>>> Why? Should be documented in commit log.
>>>
  I see no point
 in deprecating them since they're trivially easy to maintain.
>>>
>>> Given the level of cr** we already have in the command line, they are
>>> kind of noise, yes. But even then, these patches are not consistent as
>>> pointed out above.
>>>
>>> Also, they should not be documented to avoid being spread. That's what
>>> we did with other deprecated switches in QEMU.
>>>
>>> Jan
>> 
>> Jan,
>> 
>> You're comments to the patch are:
>> 
>> - No documentation.
>
> See e.g. how -M is handled in qemu-options.hx.
>
>> - Expiration date.
>
> Anthony said "forever", but I think we should remove all those that
> issue deprecation warnings after 1-2 years.
>
>> - Changelog explaining what?? (didnt get that). Perhaps better changelog
>>   in general?
>
> I'm still failing to understand who could depend on -no-kvm-irqchip or
> -no-kvm-pit. And I don't understand why -no-kvm was not included. Soe
> the reasons for include -X should be provided. Also check your patch
> subjects again, at least one was wrong.

-no-kvm should be included too.

I just ran across a user that was injecting '-no-kvm-irqchip' in their
libvirt XML via a custom attribute.  It turned out it was to work around
broken MSI support in their funky guest they were running.  It was the
wrong solution to the problem but they were doing it regardless.

The point is, there are users in the wild using these options.  There's
no reason to remove them if they are trivial to maintain (and they are
in their current form).

Regards,

Anthony Liguori

>
> Jan

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support

2012-10-04 Thread Avi Kivity

On 09/25/2012 05:20 PM, Will Deacon wrote:
>> +   case KVM_GET_REG_LIST: {
>> +   struct kvm_reg_list __user *user_list = argp;
>> +   struct kvm_reg_list reg_list;
>> +   unsigned n;
>> +
>> +   if (copy_from_user(®_list, user_list, sizeof reg_list))
>> +   return -EFAULT;
>> +   n = reg_list.n;
>> +   reg_list.n = kvm_arm_num_regs(vcpu);
>> +   if (copy_to_user(user_list, ®_list, sizeof reg_list))
>> +   return -EFAULT;
>> +   if (n < reg_list.n)
>> +   return -E2BIG;
>> +   return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
> 
> kvm_reg_list sounds like it could be done using a regset instead.

Wouldn't those regsets be userspace oriented?

For example, the GPRs returned here include all the shadowed interrupt
registers (or however they're called) while most user oriented APIs
would only include the user visible registers.

FWIW, we're trying to move to an architecture independent ABI for KVM
registers, but that's a lot of work since we need to make sure all the
weird x86 registers (and non-register state) fit into that.  Maybe that
ABI will be regset based, but I don't want to block the ARM port on this.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support

2012-10-04 Thread Christoffer Dall

On Thu, Oct 4, 2012 at 9:02 AM, Min-gyu Kim  wrote:
>
>
>> -Original Message-
>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Monday, October 01, 2012 4:22 AM
>> To: Will Deacon
>> Cc: kvm@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
>> kvm...@lists.cs.columbia.edu; rusty.russ...@linaro.org; a...@redhat.com;
>> marc.zyng...@arm.com
>> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
>> support
>>
>> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon  wrote:
>> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> >> On 09/25/2012 11:20 AM, Will Deacon wrote:
>> >> >> +/* Multiprocessor Affinity Register */
>> >> >> +#define MPIDR_CPUID(0x3 << 0)
>> >> >
>> >> > I'm fairly sure we already have code under arch/arm/ for dealing
>> >> > with the mpidr. Let's re-use that rather than reinventing it here.
>> >> >
>> >>
>> >> I see some defines in topology.c - do you want some of these factored
>> >> out into a header file that we can then also use from kvm? If so,
> where?
>> >
>> > I guess either in topology.h or a new header (topology-bits.h).
>> >
>> >> >> +#define EXCEPTION_NONE  0
>> >> >> +#define EXCEPTION_RESET 0x80
>> >> >> +#define EXCEPTION_UNDEFINED 0x40
>> >> >> +#define EXCEPTION_SOFTWARE  0x20
>> >> >> +#define EXCEPTION_PREFETCH  0x10
>> >> >> +#define EXCEPTION_DATA  0x08
>> >> >> +#define EXCEPTION_IMPRECISE 0x04
>> >> >> +#define EXCEPTION_IRQ   0x02
>> >> >> +#define EXCEPTION_FIQ   0x01
>> >> >
>> >> > Why the noise?
>> >> >
>> >>
>> >> these are simply cruft from a previous life of KVM/ARM.
>> >
>> > Ok, then please get rid of them.
>> >
>> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
>> >> >> +   u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> >> >> +   BUG_ON(mode == 0xf);
>> >> >> +   return mode;
>> >> >> +}
>> >> >
>> >> > I noticed that you have a fair few BUG_ONs throughout the series.
>> >> > Fair enough, but for hyp code is that really the right thing to do?
>> >> > Killing the guest could make more sense, perhaps?
>> >>
>> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
>> >> catch explicitly on the host. We have had a pass over the code to
>> >> change all the BUG_ONs that can be provoked by the guest and inject
>> >> the proper exceptions into the guest in this case. If you find places
>> >> where this is not the case, it should be changed, and do let me know.
>> >
>> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
>> > with one guest may not necessarily terminate the other guests? BUG_ONs
>> > in the host seem like a bad idea if the host is able to continue with
>> > a subset of guests.
>> >
>>
>> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
>> should be nowhere where a guest can cause a BUG_ON to occur in the host,
>> because that would be a bug.
>>
>> We basically never kill a guest unless really extreme things happen (like
>> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
>> does something really weird, that guest will receive the appropriate
>> exception (undefined, prefetch abort, ...)
>>
>
> I agree with Will. It seems to be overreacting to kill the entire system.
>
> From the code above, BUG_ON case clearly points out that there happened a
> serious bug case. However, killing the corresponding VM may not cause any
> further problem.
> Then leave some logs for debugging and killing the VM seems to be enough.
>
> Let's assume KVM for ARM is distributed with a critical bug.
> If the case is defended by BUG_ON, it will cause host to shutdown.
> If the case is defended by killing VM, it will cause VM to shutdown.
> In my opinion, latter case seems to be better.
>
> I looked for a guide on BUG_ON and found this:
>  http://yarchive.net/comp/linux/BUG.html
>
>

I completely agree with all this, no further argument is needed. The
point of a BUG_ON is to explicitly state the reason for a bug that
will anyhow cause the host kernel to overall malfunction. The above
bug_on statement is long gone (see new patches), and if you see other
cases like this, the code should have been tested and we can remove
the BUG_ON.

>> >> >
>> >> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
>> >> >> +   return vcpu_reg(vcpu, 15); }
>> >> >
>> >> > If you stick a struct pt_regs into struct kvm_regs, you could reuse
>> >> > ARM_pc here etc.
>> >> >
>> >>
>> >> I prefer not to, because we'd have those registers presumably for usr
>> >> mode and then we only define the others explicit. I think it's much
>> >> clearer to look at kvm_regs today.
>> >
>> > I disagree and think that you should reuse as much of the arch/arm/
>> > code as possible. Not only does it make it easier to read by people
>> > who are familiar with that code (and in turn get you more reviewers)
>> > but it also means that we limit the amount of dup

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 14:51, Rusty Russell ha scritto:
> Paolo Bonzini  writes:
> 
>> Il 04/10/2012 02:11, Rusty Russell ha scritto:
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.

 However, this bug affects (almost) all implementations and (almost) all
 devices.  It even makes sense to reserve a transport feature bit for it
 instead of a device feature bit.
>>>
>>> Perhaps, but we have to fix the bugs first!
>>
>> Yes. :)  Isn't that what mst's patch does?
>>
>>> As I said, my torture patch broke qemu immediately.  Since noone has
>>> leapt onto fixing that, I'll take a look now...
>>
>> I can look at virtio-scsi.
> 
> Actually, you can't, see my reply to Anthony...
> 
> Message-ID: <87lifm1y1n@rustcorp.com.au>

struct virtio_scsi_req_cmd {
// Read-only
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];
// Write-only part
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
u8 sense[sense_size];
char datain[];
};

where cdb_size and sense_size come from configuration space.  The device
right now expects everything before dataout/datain to be in a single
descriptor, but that's in no way part of the spec.  Am I missing
something egregious?

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Peter Zijlstra

On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
> 
> Again the numbers are ridiculously high for arch_local_irq_restore.
> Maybe there's a bad perf/kvm interaction when we're injecting an
> interrupt, I can't believe we're spending 84% of the time running the
> popf instruction. 

Smells like a software fallback that doesn't do NMI, hrtimer based
sampling typically hits popf where we re-enable interrupts.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support

2012-10-04 Thread Min-gyu Kim



> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Monday, October 01, 2012 4:22 AM
> To: Will Deacon
> Cc: kvm@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> kvm...@lists.cs.columbia.edu; rusty.russ...@linaro.org; a...@redhat.com;
> marc.zyng...@arm.com
> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
> support
> 
> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon  wrote:
> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
> >> On 09/25/2012 11:20 AM, Will Deacon wrote:
> >> >> +/* Multiprocessor Affinity Register */
> >> >> +#define MPIDR_CPUID(0x3 << 0)
> >> >
> >> > I'm fairly sure we already have code under arch/arm/ for dealing
> >> > with the mpidr. Let's re-use that rather than reinventing it here.
> >> >
> >>
> >> I see some defines in topology.c - do you want some of these factored
> >> out into a header file that we can then also use from kvm? If so,
where?
> >
> > I guess either in topology.h or a new header (topology-bits.h).
> >
> >> >> +#define EXCEPTION_NONE  0
> >> >> +#define EXCEPTION_RESET 0x80
> >> >> +#define EXCEPTION_UNDEFINED 0x40
> >> >> +#define EXCEPTION_SOFTWARE  0x20
> >> >> +#define EXCEPTION_PREFETCH  0x10
> >> >> +#define EXCEPTION_DATA  0x08
> >> >> +#define EXCEPTION_IMPRECISE 0x04
> >> >> +#define EXCEPTION_IRQ   0x02
> >> >> +#define EXCEPTION_FIQ   0x01
> >> >
> >> > Why the noise?
> >> >
> >>
> >> these are simply cruft from a previous life of KVM/ARM.
> >
> > Ok, then please get rid of them.
> >
> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
> >> >> +   u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
> >> >> +   BUG_ON(mode == 0xf);
> >> >> +   return mode;
> >> >> +}
> >> >
> >> > I noticed that you have a fair few BUG_ONs throughout the series.
> >> > Fair enough, but for hyp code is that really the right thing to do?
> >> > Killing the guest could make more sense, perhaps?
> >>
> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
> >> catch explicitly on the host. We have had a pass over the code to
> >> change all the BUG_ONs that can be provoked by the guest and inject
> >> the proper exceptions into the guest in this case. If you find places
> >> where this is not the case, it should be changed, and do let me know.
> >
> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
> > with one guest may not necessarily terminate the other guests? BUG_ONs
> > in the host seem like a bad idea if the host is able to continue with
> > a subset of guests.
> >
> 
> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
> should be nowhere where a guest can cause a BUG_ON to occur in the host,
> because that would be a bug.
> 
> We basically never kill a guest unless really extreme things happen (like
> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
> does something really weird, that guest will receive the appropriate
> exception (undefined, prefetch abort, ...)
> 

I agree with Will. It seems to be overreacting to kill the entire system.

>From the code above, BUG_ON case clearly points out that there happened a
serious bug case. However, killing the corresponding VM may not cause any
further problem.
Then leave some logs for debugging and killing the VM seems to be enough.

Let's assume KVM for ARM is distributed with a critical bug.
If the case is defended by BUG_ON, it will cause host to shutdown.
If the case is defended by killing VM, it will cause VM to shutdown.
In my opinion, latter case seems to be better.

I looked for a guide on BUG_ON and found this:
 http://yarchive.net/comp/linux/BUG.html


> >> >
> >> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
> >> >> +   return vcpu_reg(vcpu, 15); }
> >> >
> >> > If you stick a struct pt_regs into struct kvm_regs, you could reuse
> >> > ARM_pc here etc.
> >> >
> >>
> >> I prefer not to, because we'd have those registers presumably for usr
> >> mode and then we only define the others explicit. I think it's much
> >> clearer to look at kvm_regs today.
> >
> > I disagree and think that you should reuse as much of the arch/arm/
> > code as possible. Not only does it make it easier to read by people
> > who are familiar with that code (and in turn get you more reviewers)
> > but it also means that we limit the amount of duplication that we have.
> 
> Reusing a struct just for the sake of reusing is not necessarily an
> improvement. Some times it complicates things, and some times it's
> misleading. To me, pt_regs carry the meaning that these are the registers
> for a user space process that traps into the kernel - in KVM we emulate a
> virtual CPU and that current definition is quite clear.
> 
> The argument that more people will review the code if the struct contains
> a pt_regs field rather than a usr_regs field is completely invalid,
>

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Paolo Bonzini  writes:

> Il 04/10/2012 02:11, Rusty Russell ha scritto:
>> > > There's a reason I haven't done this.  I really, really dislike "my
>> > > implemention isn't broken" feature bits.  We could have an infinite
>> > > number of them, for each bug in each device.
>> >
>> > However, this bug affects (almost) all implementations and (almost) all
>> > devices.  It even makes sense to reserve a transport feature bit for it
>> > instead of a device feature bit.
>>
>> Perhaps, but we have to fix the bugs first!
>
> Yes. :)  Isn't that what mst's patch does?
>
>> As I said, my torture patch broke qemu immediately.  Since noone has
>> leapt onto fixing that, I'll take a look now...
>
> I can look at virtio-scsi.

Actually, you can't, see my reply to Anthony...

Message-ID: <87lifm1y1n@rustcorp.com.au>

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-10-04 Thread Avi Kivity

On 10/04/2012 12:56 PM, Raghavendra K T wrote:
> On 10/03/2012 10:55 PM, Avi Kivity wrote:
>> On 10/03/2012 04:29 PM, Raghavendra K T wrote:
>>> * Avi Kivity  [2012-09-27 14:03:59]:
>>>
 On 09/27/2012 01:23 PM, Raghavendra K T wrote:
>>
>>> [...]
> 2) looking at the result (comparing A & C) , I do feel we have
> significant in iterating over vcpus (when compared to even vmexit)
> so We still would need undercommit fix sugested by PeterZ
> (improving by
> 140%). ?

 Looking only at the current runqueue?  My worry is that it misses a lot
 of cases.  Maybe try the current runqueue first and then others.

>>>
>>> Okay. Do you mean we can have something like
>>>
>>> +   if (rq->nr_running == 1 && p_rq->nr_running == 1) {
>>> +   yielded = -ESRCH;
>>> +   goto out_irq;
>>> +   }
>>>
>>> in the Peter's patch ?
>>>
>>> ( I thought lot about && or || . Both seem to have their own cons ).
>>> But that should be only when we have short term imbalance, as PeterZ
>>> told.
>>
>> I'm missing the context.  What is p_rq?
> 
> p_rq is the run queue of target vcpu.
> What I was trying below was to address Rik concern. Suppose
> rq of source vcpu has one task, but target probably has two task,
> with a eligible vcpu waiting to be scheduled.
> 
>>
>> What I mean was:
>>
>>if can_yield_to_process_in_current_rq
>>   do that
>>else if can_yield_to_process_in_other_rq
>>   do that
>>else
>>   return -ESRCH
> 
> I think you are saying we have to check the run queue of the
> source vcpu, if we have a vcpu belonging to same VM and try yield to
> that? ignoring whatever the target vcpu we received for yield_to.
> 
> Or is it that kvm_vcpu_yield_to should now check the vcpus of same vm
> belonging to same run queue first. If we don't succeed, go again for
> a vcpu in different runqueue.

Right.  Prioritize vcpus that are cheap to yield to.  But may return bad
results if all vcpus on the current runqueue are spinners, so probably
not a good idea.

> Does it add more overhead especially in <= 1x scenario?

The current runqueue should have just our vcpu in that case, so low
overhead.  But it's a bad idea due to the above scenario.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Avi Kivity

On 10/04/2012 12:49 PM, Raghavendra K T wrote:
> On 10/03/2012 10:35 PM, Avi Kivity wrote:
>> On 10/03/2012 02:22 PM, Raghavendra K T wrote:
 So I think it's worth trying again with ple_window of 2-4.

>>>
>>> Hi Avi,
>>>
>>> I ran different benchmarks increasing ple_window, and results does not
>>> seem to be encouraging for increasing ple_window.
>>
>> Thanks for testing! Comments below.
>>
>>> Results:
>>> 16 core PLE machine with 16 vcpu guest.
>>>
>>> base kernel = 3.6-rc5 + ple handler optimization patch
>>> base_pleopt_8k = base kernel + ple window = 8k
>>> base_pleopt_16k = base kernel + ple window = 16k
>>> base_pleopt_32k = base kernel + ple window = 32k
>>>
>>>
>>> Percentage improvements of benchmarks w.r.t base_pleopt with
>>> ple_window = 4096
>>>
>>> base_pleopt_8kbase_pleopt_16kbase_pleopt_32k
>>> -   
>>>
>>> kernbench_1x-5.54915-15.94529-44.31562
>>> kernbench_2x-7.89399-17.75039-37.73498
>>
>> So, 44% degradation even with no overcommit?  That's surprising.
> 
> Yes. Kernbench was run with #threads = #vcpu * 2 as usual. Is it
> spending 8 times the original ple_window cycles for 16 vcpus
> significant?

A PLE exit when not overcommitted cannot do any good, it is better to
spin in the guest rather that look for candidates on the host.  In fact
when we benchmark we often disable PLE completely.

> 
>>
>>> I also got perf top output to analyse the difference. Difference comes
>>> because of flushtlb (and also spinlock).
>>
>> That's in the guest, yes?
> 
> Yes. Perf is in guest.
> 
>>
>>>
>>> Ebizzy run for 4k ple_window
>>> -  87.20%  [kernel]  [k] arch_local_irq_restore
>>> - arch_local_irq_restore
>>>- 100.00% _raw_spin_unlock_irqrestore
>>>   + 52.89% release_pages
>>>   + 47.10% pagevec_lru_move_fn
>>> -   5.71%  [kernel]  [k] arch_local_irq_restore
>>> - arch_local_irq_restore
>>>+ 86.03% default_send_IPI_mask_allbutself_phys
>>>+ 13.96% default_send_IPI_mask_sequence_phys
>>> -   3.10%  [kernel]  [k] smp_call_function_many
>>>   smp_call_function_many
>>>
>>>
>>> Ebizzy run for 32k ple_window
>>>
>>> -  91.40%  [kernel]  [k] arch_local_irq_restore
>>> - arch_local_irq_restore
>>>- 100.00% _raw_spin_unlock_irqrestore
>>>   + 53.13% release_pages
>>>   + 46.86% pagevec_lru_move_fn
>>> -   4.38%  [kernel]  [k] smp_call_function_many
>>>   smp_call_function_many
>>> -   2.51%  [kernel]  [k] arch_local_irq_restore
>>> - arch_local_irq_restore
>>>+ 90.76% default_send_IPI_mask_allbutself_phys
>>>+ 9.24% default_send_IPI_mask_sequence_phys
>>>
>>
>> Both the 4k and the 32k results are crazy.  Why is
>> arch_local_irq_restore() so prominent?  Do you have a very high
>> interrupt rate in the guest?
> 
> How to measure if I have high interrupt rate in guest?
> From /proc/interrupt numbers I am not able to judge :(

'vmstat 1'

> 
> I went back and got the results on a 32 core machine with 32 vcpu guest.
> Strangely, I got result supporting the claim that increasing ple_window
> helps for non-overcommitted scenario.
> 
> 32 core 32 vcpu guest 1x scenarios.
> 
> ple_gap = 0
> kernbench: Elapsed Time 38.61
> ebizzy: 7463 records/s
> 
> ple_window = 4k
> kernbench: Elapsed Time 43.5067
> ebizzy:2528 records/s
> 
> ple_window = 32k
> kernebench : Elapsed Time 39.4133
> ebizzy: 7196 records/s

So maybe something was wrong with the first measurement.

> 
> 
> perf top for ebizzy for above:
> ple_gap = 0
> -  84.74%  [kernel]  [k] arch_local_irq_restore
>- arch_local_irq_restore
>   - 100.00% _raw_spin_unlock_irqrestore
>  + 50.96% release_pages
>  + 49.02% pagevec_lru_move_fn
> -   6.57%  [kernel]  [k] arch_local_irq_restore
>- arch_local_irq_restore
>   + 92.54% default_send_IPI_mask_allbutself_phys
>   + 7.46% default_send_IPI_mask_sequence_phys
> -   1.54%  [kernel]  [k] smp_call_function_many
>  smp_call_function_many

Again the numbers are ridiculously high for arch_local_irq_restore.
Maybe there's a bad perf/kvm interaction when we're injecting an
interrupt, I can't believe we're spending 84% of the time running the
popf instruction.

> 
> ple_window = 32k
> -  84.47%  [kernel]  [k] arch_local_irq_restore
>+ arch_local_irq_restore
> -   6.46%  [kernel]  [k] arch_local_irq_restore
>- arch_local_irq_restore
>   + 93.51% default_send_IPI_mask_allbutself_phys
>   + 6.49% default_send_IPI_mask_sequence_phys
> -   1.80%  [kernel]  [k] smp_call_function_many
>- smp_call_function_many
>   + 99.98% native_flush_tlb_others
> 
> 
> ple_window = 4k
> -  91.35%  [kernel]  [k] arch_local_irq_restore
>- arch_local_irq_restore
>   - 100.00% _raw_spin_unlock_irqrestore
>  + 53.19% release_pages
>  + 46.81% pagevec_lru_move_fn
> -   3.90%  [kernel]  [k] smp_call_function_many
>  smp_call_fu

Re: qemu-kvm: remove "boot=on|off" drive parameter compatibility

2012-10-04 Thread Jan Kiszka

On 2012-10-04 14:10, Lucas Meneghel Rodrigues wrote:
> On 10/04/2012 07:48 AM, Jan Kiszka wrote:
>> On 2012-10-03 15:19, Paolo Bonzini wrote:
>>> Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:
 Yep, I did send patches with the testdev device present on qemu-kvm.git
 to qemu.git a while ago, but there were many comments on the review, I
 ended up not implementing everything that was asked and the patches were
 archived.

 If nobody wants to step up to port it, I'll re-read the original thread
 and will spin up new patches (and try to go through the end with it).
 Executing the KVM unittests is something that we can't afford to lose,
 so I'd say it's important on this last mile effort to get rid of qemu-kvm.
>>>
>>> Absolutely, IIRC the problem was that testdev did a little bit of
>>> everything... let's see what's the functionality of testdev:
>>>
>>> - write (port 0xf1), can be replaced in autotest with:
>>> -device isa-debugcon,iobase=0xf1,chardev=...
>>>
>>> - exit code (port 0xf4), see this series:
>>> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html
>>>
>>> - ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
>>> is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:
>>>
>>>  u32 rs = ((inb_cmos(0x34) << 16) | (inb_cmos(0x35) << 24));
>>>  if (rs)
>>>  rs += 16 * 1024 * 1024;
>>>  else
>>>  rs = (((inb_cmos(0x30) << 10) | (inb_cmos(0x31) << 18))
>>>+ 1 * 1024 * 1024);
>>>
>>> The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.
>>
>> IIRC, one of the biggest problem with testdev was its hack to inject
>> interrupts.
> 
> Jan, I assume this commit helps to fix this, right?
> 
> commit b334ec567f1de9a60349991e7b75083d569ddb0a
> Author: Jan Kiszka 
> Date:   Fri Mar 2 10:30:47 2012 +0100
> 
>  qemu-kvm: Use upstream kvm-i8259
> 
>  Drop the qemu-kvm version in favor of the equivalent upstream
>  implementation. This allows to move the i8259 back into the hwlib.
> 
>  Note that this also drops the testdev hack and restores proper
>  isa_get_irq. If testdev scripts exist that inject > IRQ15, they need
>  fixing. Testing for these interrupts on the PIIX3 makes no practical
>  sense anyway as those lines are unused.
> 
>  Signed-off-by: Jan Kiszka 
>  Signed-off-by: Avi Kivity 

Yes, this improved it a lot as we no longer depend on additional
changes. I'm not sure if there was resistance beyond that.

When cleaning up the code: register_ioport_read must be replaced with
the memory API.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm-kmod 3.6 on linux 3.2.0

2012-10-04 Thread Peter Lieven

Hi,

kvm-kmod 3.6 fails to compile against a 3.2.0 kernel with the following error:

/usr/src/kvm-kmod-3.6/x86/x86.c: In function ‘get_msr_mce’:
/usr/src/kvm-kmod-3.6/x86/x86.c:1908:27: error: ‘kvm’ undeclared (first use in 
this function)
/usr/src/kvm-kmod-3.6/x86/x86.c:1908:27: note: each undeclared identifier is 
reported only once for each function it appears in

Any ideas?

Peter

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove "boot=on|off" drive parameter compatibility

2012-10-04 Thread Lucas Meneghel Rodrigues


On 10/04/2012 07:48 AM, Jan Kiszka wrote:

On 2012-10-03 15:19, Paolo Bonzini wrote:

Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:

Yep, I did send patches with the testdev device present on qemu-kvm.git
to qemu.git a while ago, but there were many comments on the review, I
ended up not implementing everything that was asked and the patches were
archived.

If nobody wants to step up to port it, I'll re-read the original thread
and will spin up new patches (and try to go through the end with it).
Executing the KVM unittests is something that we can't afford to lose,
so I'd say it's important on this last mile effort to get rid of qemu-kvm.


Absolutely, IIRC the problem was that testdev did a little bit of
everything... let's see what's the functionality of testdev:

- write (port 0xf1), can be replaced in autotest with:
-device isa-debugcon,iobase=0xf1,chardev=...

- exit code (port 0xf4), see this series:
http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html

- ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:

 u32 rs = ((inb_cmos(0x34) << 16) | (inb_cmos(0x35) << 24));
 if (rs)
 rs += 16 * 1024 * 1024;
 else
 rs = (((inb_cmos(0x30) << 10) | (inb_cmos(0x31) << 18))
   + 1 * 1024 * 1024);

The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.


IIRC, one of the biggest problem with testdev was its hack to inject
interrupts.


Jan, I assume this commit helps to fix this, right?

commit b334ec567f1de9a60349991e7b75083d569ddb0a
Author: Jan Kiszka 
Date:   Fri Mar 2 10:30:47 2012 +0100

qemu-kvm: Use upstream kvm-i8259

Drop the qemu-kvm version in favor of the equivalent upstream
implementation. This allows to move the i8259 back into the hwlib.

Note that this also drops the testdev hack and restores proper
isa_get_irq. If testdev scripts exist that inject > IRQ15, they need
fixing. Testing for these interrupts on the PIIX3 makes no practical
sense anyway as those lines are unused.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Alexander Graf


On 04.10.2012, at 13:06, Bhushan Bharat-R65777 wrote:

> 
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Monday, September 24, 2012 9:50 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
>> Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
>> 
>> 
>> On 21.08.2012, at 15:52, Bharat Bhushan wrote:
>> 
>>> This patch adds the debug stub support on booke/bookehv.
>>> Now QEMU debug stub can use hw breakpoint, watchpoint and
>>> software breakpoint to debug guest.
>>> 
>>> Signed-off-by: Bharat Bhushan 
>>> ---
>>> arch/powerpc/include/asm/kvm.h|   29 ++-
>>> arch/powerpc/include/asm/kvm_host.h   |5 +
>>> arch/powerpc/kernel/asm-offsets.c |   26 ++
>>> arch/powerpc/kvm/booke.c  |  144 
>>> +
>>> arch/powerpc/kvm/booke_interrupts.S   |  110 +
>>> arch/powerpc/kvm/bookehv_interrupts.S |  141 
>>> +++-
>>> arch/powerpc/kvm/e500mc.c |3 +-
>>> 7 files changed, 435 insertions(+), 23 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
>>> index 61b197e..53479ea 100644
>>> --- a/arch/powerpc/include/asm/kvm.h
>>> +++ b/arch/powerpc/include/asm/kvm.h
>>> @@ -25,6 +25,7 @@
>>> /* Select powerpc specific features in  */
>>> #define __KVM_HAVE_SPAPR_TCE
>>> #define __KVM_HAVE_PPC_SMT
>>> +#define __KVM_HAVE_GUEST_DEBUG
>>> 
>>> struct kvm_regs {
>>> __u64 pc;
>>> @@ -264,7 +265,31 @@ struct kvm_fpu {
>>> __u64 fpr[32];
>>> };
>>> 
>>> +
>>> +/*
>>> + * Defines for h/w breakpoint, watchpoint (read, write or both) and
>>> + * software breakpoint.
>>> + * These are used as "type" in KVM_SET_GUEST_DEBUG ioctl and "status"
>>> + * for KVM_DEBUG_EXIT.
>>> + */
>>> +#define KVMPPC_DEBUG_NONE  0x0
>>> +#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
>>> +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
>>> +#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
>>> struct kvm_debug_exit_arch {
>>> +   __u64 pc;
>>> +   /*
>>> +* exception -> returns the exception number. If the KVM_DEBUG_EXIT
>>> +* exit is not handled (say not h/w breakpoint or software breakpoint
>>> +* set for this address) by qemu then it is supposed to inject this
>>> +* exception to guest.
>>> +*/
>>> +   __u32 exception;
>>> +   /*
>>> +* exiting to userspace because of h/w breakpoint, watchpoint
>>> +* (read, write or both) and software breakpoint.
>>> +*/
>>> +   __u32 status;
>>> };
>>> 
>>> /* for KVM_SET_GUEST_DEBUG */
>>> @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
>>>  * Type denotes h/w breakpoint, read watchpoint, write
>>>  * watchpoint or watchpoint (both read and write).
>>>  */
>>> -#define KVMPPC_DEBUG_NOTYPE0x0
>>> -#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
>>> -#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
>>> -#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
>>> __u32 type;
>>> __u32 pad1;
>>> __u64 pad2;
>>> diff --git a/arch/powerpc/include/asm/kvm_host.h
>> b/arch/powerpc/include/asm/kvm_host.h
>>> index c7219c1..3ba465a 100644
>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>> @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
>>> u32 mmucfg;
>>> u32 epr;
>>> u32 crit_save;
>>> +   /* guest debug registers*/
>>> struct kvmppc_booke_debug_reg dbg_reg;
>>> +   /* shadow debug registers */
>>> +   struct kvmppc_booke_debug_reg shadow_dbg_reg;
>>> +   /* host debug registers*/
>>> +   struct kvmppc_booke_debug_reg host_dbg_reg;
>>> #endif
>>> gpa_t paddr_accessed;
>>> gva_t vaddr_accessed;
>>> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-
>> offsets.c
>>> index 555448e..6987821 100644
>>> --- a/arch/powerpc/kernel/asm-offsets.c
>>> +++ b/arch/powerpc/kernel/asm-offsets.c
>>> @@ -564,6 +564,32 @@ int main(void)
>>> DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
>>> DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
>>> DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
>>> +   DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
>>> +   DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
>>> +   DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
>>> +   DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
>>> + dbcr0));
>>> +   DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
>>> + dbcr1));
>>> +   DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
>>> + dbcr2));
>>> +#ifdef CONFIG_KVM_E500MC
>>> +   DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvm

RE: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Bhushan Bharat-R65777



> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, September 24, 2012 9:50 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
> 
> 
> On 21.08.2012, at 15:52, Bharat Bhushan wrote:
> 
> > This patch adds the debug stub support on booke/bookehv.
> > Now QEMU debug stub can use hw breakpoint, watchpoint and
> > software breakpoint to debug guest.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > arch/powerpc/include/asm/kvm.h|   29 ++-
> > arch/powerpc/include/asm/kvm_host.h   |5 +
> > arch/powerpc/kernel/asm-offsets.c |   26 ++
> > arch/powerpc/kvm/booke.c  |  144 
> > +
> > arch/powerpc/kvm/booke_interrupts.S   |  110 +
> > arch/powerpc/kvm/bookehv_interrupts.S |  141 
> > +++-
> > arch/powerpc/kvm/e500mc.c |3 +-
> > 7 files changed, 435 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
> > index 61b197e..53479ea 100644
> > --- a/arch/powerpc/include/asm/kvm.h
> > +++ b/arch/powerpc/include/asm/kvm.h
> > @@ -25,6 +25,7 @@
> > /* Select powerpc specific features in  */
> > #define __KVM_HAVE_SPAPR_TCE
> > #define __KVM_HAVE_PPC_SMT
> > +#define __KVM_HAVE_GUEST_DEBUG
> >
> > struct kvm_regs {
> > __u64 pc;
> > @@ -264,7 +265,31 @@ struct kvm_fpu {
> > __u64 fpr[32];
> > };
> >
> > +
> > +/*
> > + * Defines for h/w breakpoint, watchpoint (read, write or both) and
> > + * software breakpoint.
> > + * These are used as "type" in KVM_SET_GUEST_DEBUG ioctl and "status"
> > + * for KVM_DEBUG_EXIT.
> > + */
> > +#define KVMPPC_DEBUG_NONE  0x0
> > +#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
> > +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
> > +#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
> > struct kvm_debug_exit_arch {
> > +   __u64 pc;
> > +   /*
> > +* exception -> returns the exception number. If the KVM_DEBUG_EXIT
> > +* exit is not handled (say not h/w breakpoint or software breakpoint
> > +* set for this address) by qemu then it is supposed to inject this
> > +* exception to guest.
> > +*/
> > +   __u32 exception;
> > +   /*
> > +* exiting to userspace because of h/w breakpoint, watchpoint
> > +* (read, write or both) and software breakpoint.
> > +*/
> > +   __u32 status;
> > };
> >
> > /* for KVM_SET_GUEST_DEBUG */
> > @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
> >  * Type denotes h/w breakpoint, read watchpoint, write
> >  * watchpoint or watchpoint (both read and write).
> >  */
> > -#define KVMPPC_DEBUG_NOTYPE0x0
> > -#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
> > -#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
> > -#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
> > __u32 type;
> > __u32 pad1;
> > __u64 pad2;
> > diff --git a/arch/powerpc/include/asm/kvm_host.h
> b/arch/powerpc/include/asm/kvm_host.h
> > index c7219c1..3ba465a 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
> > u32 mmucfg;
> > u32 epr;
> > u32 crit_save;
> > +   /* guest debug registers*/
> > struct kvmppc_booke_debug_reg dbg_reg;
> > +   /* shadow debug registers */
> > +   struct kvmppc_booke_debug_reg shadow_dbg_reg;
> > +   /* host debug registers*/
> > +   struct kvmppc_booke_debug_reg host_dbg_reg;
> > #endif
> > gpa_t paddr_accessed;
> > gva_t vaddr_accessed;
> > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-
> offsets.c
> > index 555448e..6987821 100644
> > --- a/arch/powerpc/kernel/asm-offsets.c
> > +++ b/arch/powerpc/kernel/asm-offsets.c
> > @@ -564,6 +564,32 @@ int main(void)
> > DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
> > DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
> > DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
> > +   DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
> > +   DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
> > +   DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
> > +   DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
> > + dbcr0));
> > +   DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
> > + dbcr1));
> > +   DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
> > + dbcr2));
> > +#ifdef CONFIG_KVM_E500MC
> > +   DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvmppc_booke_debug_reg,
> > + dbcr4));
> > +

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-10-04 Thread Raghavendra K T


On 10/03/2012 10:55 PM, Avi Kivity wrote:

On 10/03/2012 04:29 PM, Raghavendra K T wrote:

* Avi Kivity  [2012-09-27 14:03:59]:


On 09/27/2012 01:23 PM, Raghavendra K T wrote:



[...]

2) looking at the result (comparing A & C) , I do feel we have
significant in iterating over vcpus (when compared to even vmexit)
so We still would need undercommit fix sugested by PeterZ (improving by
140%). ?


Looking only at the current runqueue?  My worry is that it misses a lot
of cases.  Maybe try the current runqueue first and then others.



Okay. Do you mean we can have something like

+   if (rq->nr_running == 1 && p_rq->nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }

in the Peter's patch ?

( I thought lot about && or || . Both seem to have their own cons ).
But that should be only when we have short term imbalance, as PeterZ
told.


I'm missing the context.  What is p_rq?


p_rq is the run queue of target vcpu.
What I was trying below was to address Rik concern. Suppose
rq of source vcpu has one task, but target probably has two task,
with a eligible vcpu waiting to be scheduled.



What I mean was:

   if can_yield_to_process_in_current_rq
  do that
   else if can_yield_to_process_in_other_rq
  do that
   else
  return -ESRCH


I think you are saying we have to check the run queue of the
source vcpu, if we have a vcpu belonging to same VM and try yield to 
that? ignoring whatever the target vcpu we received for yield_to.


Or is it that kvm_vcpu_yield_to should now check the vcpus of same vm
belonging to same run queue first. If we don't succeed, go again for
a vcpu in different runqueue.
Does it add more overhead especially in <= 1x scenario?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Raghavendra K T


On 10/03/2012 10:35 PM, Avi Kivity wrote:

On 10/03/2012 02:22 PM, Raghavendra K T wrote:

So I think it's worth trying again with ple_window of 2-4.



Hi Avi,

I ran different benchmarks increasing ple_window, and results does not
seem to be encouraging for increasing ple_window.


Thanks for testing! Comments below.


Results:
16 core PLE machine with 16 vcpu guest.

base kernel = 3.6-rc5 + ple handler optimization patch
base_pleopt_8k = base kernel + ple window = 8k
base_pleopt_16k = base kernel + ple window = 16k
base_pleopt_32k = base kernel + ple window = 32k


Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096

base_pleopt_8k  base_pleopt_16k base_pleopt_32k
-   

kernbench_1x-5.54915-15.94529   -44.31562
kernbench_2x-7.89399-17.75039   -37.73498


So, 44% degradation even with no overcommit?  That's surprising.


Yes. Kernbench was run with #threads = #vcpu * 2 as usual. Is it
spending 8 times the original ple_window cycles for 16 vcpus
significant?




I also got perf top output to analyse the difference. Difference comes
because of flushtlb (and also spinlock).


That's in the guest, yes?


Yes. Perf is in guest.





Ebizzy run for 4k ple_window
-  87.20%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   - 100.00% _raw_spin_unlock_irqrestore
  + 52.89% release_pages
  + 47.10% pagevec_lru_move_fn
-   5.71%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   + 86.03% default_send_IPI_mask_allbutself_phys
   + 13.96% default_send_IPI_mask_sequence_phys
-   3.10%  [kernel]  [k] smp_call_function_many
  smp_call_function_many


Ebizzy run for 32k ple_window

-  91.40%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   - 100.00% _raw_spin_unlock_irqrestore
  + 53.13% release_pages
  + 46.86% pagevec_lru_move_fn
-   4.38%  [kernel]  [k] smp_call_function_many
  smp_call_function_many
-   2.51%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   + 90.76% default_send_IPI_mask_allbutself_phys
   + 9.24% default_send_IPI_mask_sequence_phys



Both the 4k and the 32k results are crazy.  Why is
arch_local_irq_restore() so prominent?  Do you have a very high
interrupt rate in the guest?


How to measure if I have high interrupt rate in guest?
From /proc/interrupt numbers I am not able to judge :(

I went back and got the results on a 32 core machine with 32 vcpu guest.
Strangely, I got result supporting the claim that increasing ple_window 
helps for non-overcommitted scenario.


32 core 32 vcpu guest 1x scenarios.

ple_gap = 0
kernbench: Elapsed Time 38.61
ebizzy: 7463 records/s

ple_window = 4k
kernbench: Elapsed Time 43.5067
ebizzy:2528 records/s

ple_window = 32k
kernebench : Elapsed Time 39.4133
ebizzy: 7196 records/s


perf top for ebizzy for above:
ple_gap = 0
-  84.74%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  - 100.00% _raw_spin_unlock_irqrestore
 + 50.96% release_pages
 + 49.02% pagevec_lru_move_fn
-   6.57%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  + 92.54% default_send_IPI_mask_allbutself_phys
  + 7.46% default_send_IPI_mask_sequence_phys
-   1.54%  [kernel]  [k] smp_call_function_many
 smp_call_function_many

ple_window = 32k
-  84.47%  [kernel]  [k] arch_local_irq_restore
   + arch_local_irq_restore
-   6.46%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  + 93.51% default_send_IPI_mask_allbutself_phys
  + 6.49% default_send_IPI_mask_sequence_phys
-   1.80%  [kernel]  [k] smp_call_function_many
   - smp_call_function_many
  + 99.98% native_flush_tlb_others


ple_window = 4k
-  91.35%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  - 100.00% _raw_spin_unlock_irqrestore
 + 53.19% release_pages
 + 46.81% pagevec_lru_move_fn
-   3.90%  [kernel]  [k] smp_call_function_many
 smp_call_function_many
-   2.94%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  + 93.12% default_send_IPI_mask_allbutself_phys
  + 6.88% default_send_IPI_mask_sequence_phys

Let me know if I can try something here..
/me confused :(

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove "boot=on|off" drive parameter compatibility

2012-10-04 Thread Jan Kiszka

On 2012-10-03 15:19, Paolo Bonzini wrote:
> Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:
>> Yep, I did send patches with the testdev device present on qemu-kvm.git
>> to qemu.git a while ago, but there were many comments on the review, I
>> ended up not implementing everything that was asked and the patches were
>> archived.
>>
>> If nobody wants to step up to port it, I'll re-read the original thread
>> and will spin up new patches (and try to go through the end with it).
>> Executing the KVM unittests is something that we can't afford to lose,
>> so I'd say it's important on this last mile effort to get rid of qemu-kvm.
> 
> Absolutely, IIRC the problem was that testdev did a little bit of
> everything... let's see what's the functionality of testdev:
> 
> - write (port 0xf1), can be replaced in autotest with:
> -device isa-debugcon,iobase=0xf1,chardev=...
> 
> - exit code (port 0xf4), see this series:
> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html
> 
> - ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
> is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:
> 
> u32 rs = ((inb_cmos(0x34) << 16) | (inb_cmos(0x35) << 24));
> if (rs)
> rs += 16 * 1024 * 1024;
> else
> rs = (((inb_cmos(0x30) << 10) | (inb_cmos(0x31) << 18))
>   + 1 * 1024 * 1024);
> 
> The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.

IIRC, one of the biggest problem with testdev was its hack to inject
interrupts.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[virt][PATCH 4/4] virt: Adds multihost machine type test variants.

2012-10-04 Thread Jiří Župka

This patch groups multihost test into one group to which is
added machine types variant of test.

Signed-off-by: Jiří Župka 
---
 kvm/cfg/multi-host-tests.cfg.sample |6 +-
 shared/cfg/subtests.cfg.sample  |  164 ++-
 2 files changed, 107 insertions(+), 63 deletions(-)

diff --git a/kvm/cfg/multi-host-tests.cfg.sample 
b/kvm/cfg/multi-host-tests.cfg.sample
index bc0ce03..d47e756 100644
--- a/kvm/cfg/multi-host-tests.cfg.sample
+++ b/kvm/cfg/multi-host-tests.cfg.sample
@@ -21,7 +21,8 @@ variants:
 only no_pci_assignable
 only no_9p_export
 only smallpages
-only Fedora.15.64
+only pc
+only Fedora.17.64
 only migrate_multi_host
 
 # Runs qemu, f16 64 bit guest OS, install, boot, shutdown
@@ -37,7 +38,8 @@ variants:
 only no_pci_assignable
 only no_9p_export
 only smallpages
-only Fedora.15.64
+only pc
+only Fedora.17.64
 only cpuflags_multi_host
 
 only qemu_migrate_multi_host
diff --git a/shared/cfg/subtests.cfg.sample b/shared/cfg/subtests.cfg.sample
index ab62e12..957bb9c 100644
--- a/shared/cfg/subtests.cfg.sample
+++ b/shared/cfg/subtests.cfg.sample
@@ -960,52 +960,111 @@ variants:
 - monotonic_time:
 test_control_file = monotonic_time.control
 
-- migrate_multi_host: install setup image_copy unattended_install.cdrom
-type = migration_multi_host
-vms = "vm1"
-start_vm = no
-migration_test_command = help
-migration_bg_command = "cd /tmp; nohup tcpdump -q -i any -t ip host 
localhost"
-migration_bg_check_command = pgrep tcpdump
-migration_bg_kill_command = pkill tcpdump
-kill_vm_on_error = yes
-iterations = 2
-used_mem = 1024
-mig_timeout = 4800
-disk_prepare_timeout = 360
-comm_port = 13234
-regain_ip_cmd = killall dhclient; sleep 10; dhclient;
+- @multi_host:
 variants:
-#Migration protocol.
--tcp:
+- migrate_multi_host: install setup image_copy 
unattended_install.cdrom
+type = migration_multi_host
+vms = "vm1"
+start_vm = no
+migration_test_command = help
+migration_bg_command = "cd /tmp; nohup tcpdump -q -i any -t ip 
host localhost"
+migration_bg_check_command = pgrep tcpdump
+migration_bg_kill_command = pkill tcpdump
+kill_vm_on_error = yes
+iterations = 2
+used_mem = 1024
+mig_timeout = 4800
+disk_prepare_timeout = 360
+comm_port = 13234
+regain_ip_cmd = killall dhclient; sleep 10; dhclient;
 variants:
-- @default:
-type = migration_multi_host
-- measure_migration_speed:
-only Linux
-mig_speed = 1G
-type = migration_multi_host_with_speed_measurement
--fd:
-type = migration_multi_host_fd
-
-- migration_multi_host_with_file_transfer: install setup image_copy 
unattended_install.cdrom
-type = migration_multi_host_with_file_transfer
-vms = "vm1"
-start_vm = no
-kill_vm_on_error = yes
-used_mem = 1024
-mig_timeout = 4800
-disk_prepare_timeout = 360
-comm_port = 13234
-#path where file is stored on guest.
-guest_path = "/tmp/file"
-#size of generated file.
-file_size = 500
-transfer_timeout = 240
-#Transfer speed in Mb
-transfer_speed = 100
-#Count of migration during file transfer.
-migrate_count = 3
+#Migration protocol.
+-tcp:
+variants:
+- @default:
+type = migration_multi_host
+- measure_migration_speed:
+only Linux
+mig_speed = 1G
+type = 
migration_multi_host_with_speed_measurement
+-fd:
+type = migration_multi_host_fd
+
+- migration_multi_host_with_file_transfer: install setup 
image_copy unattended_install.cdrom
+type = migration_multi_host_with_file_transfer
+vms = "vm1"
+start_vm = no
+kill_vm_on_error = yes
+used_mem = 1024
+mig_timeout = 4800
+disk_prepare_timeout = 360
+comm_port = 13234
+#path where file is stored on guest.
+guest_path = "/tmp/file"
+#size of generated file.
+file_size = 500
+trans

[virt][PATCH 3/4] virt: Adds kvm,libvirt check of machine model

2012-10-04 Thread Jiří Župka

Signed-off-by: Jiří Župka 
---
 virttest/kvm_vm.py |   12 ++--
 virttest/libvirt_vm.py |   13 +++--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/virttest/kvm_vm.py b/virttest/kvm_vm.py
index 9877d55..c51cd1b 100644
--- a/virttest/kvm_vm.py
+++ b/virttest/kvm_vm.py
@@ -830,7 +830,6 @@ class VM(virt_vm.BaseVM):
 cmd = ""
 return cmd
 
-
 def add_machine_type(hlp, machine_type):
 if has_option(hlp, "machine") or has_option(hlp, "M"):
 return " -M %s" % machine_type
@@ -916,6 +915,7 @@ class VM(virt_vm.BaseVM):
 self.qemu_binary = qemu_binary
 hlp = commands.getoutput("%s -help" % qemu_binary)
 support_cpu_model = commands.getoutput("%s -cpu ?list" % qemu_binary)
+support_machine_type = commands.getoutput("%s -M ?" % qemu_binary)
 
 device_help = ""
 if has_option(hlp, "device"):
@@ -1165,7 +1165,15 @@ class VM(virt_vm.BaseVM):
 
 machine_type = params.get("machine_type")
 if machine_type:
-qemu_cmd += add_machine_type(hlp, machine_type)
+m_types = []
+for m in support_machine_type.splitlines()[1:]:
+m_types.append(m.split()[0])
+
+if machine_type in m_types:
+qemu_cmd += add_machine_type(hlp, machine_type)
+else:
+raise error.TestNAError("Not supported machine type %s." %
+(machine_type))
 
 for cdrom in params.objects("cdroms"):
 cd_format = params.get("cd_format", "")
diff --git a/virttest/libvirt_vm.py b/virttest/libvirt_vm.py
index 4e03835..c46c139 100644
--- a/virttest/libvirt_vm.py
+++ b/virttest/libvirt_vm.py
@@ -8,7 +8,7 @@ import time, os, logging, fcntl, re, shutil, urlparse, tempfile
 from autotest.client.shared import error
 from autotest.client import utils, os_dep
 from xml.dom import minidom
-import utils_misc, virt_vm, storage, aexpect, remote, virsh
+import utils_misc, virt_vm, storage, aexpect, remote, virsh, libvirt_xml
 
 
 def libvirtd_restart():
@@ -521,6 +521,11 @@ class VM(virt_vm.BaseVM):
 
 help = utils.system_output("%s --help" % virt_install_binary)
 
+# Find all machine type. There isn't filtred any machine type even if
+# domain doesn't support this machine type. Filtering could be added.
+me = libvirt_xml.LibvirtXML().getroot().findall("./guest/arch/machine")
+support_machine_type = map(lambda m: m.text, me)
+
 # Start constructing the qemu command
 virt_install_cmd = ""
 # Set the X11 display parameter if requested
@@ -542,7 +547,11 @@ class VM(virt_vm.BaseVM):
 
 machine_type = params.get("machine_type")
 if machine_type:
-virt_install_cmd += add_machine_type(help, machine_type)
+if machine_type in support_machine_type:
+virt_install_cmd += add_machine_type(help, machine_type)
+else:
+raise error.TestNAError("Not supported machine type %s." %
+(machine_type))
 
 mem = params.get("mem")
 if mem:
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[virt][PATCH 2/4] virt: Cleanup unused import in libvirt_xml.py

2012-10-04 Thread Jiří Župka

Signed-off-by: Jiří Župka 
---
 virttest/libvirt_xml.py |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/virttest/libvirt_xml.py b/virttest/libvirt_xml.py
index bedc496..4376111 100644
--- a/virttest/libvirt_xml.py
+++ b/virttest/libvirt_xml.py
@@ -8,10 +8,8 @@
 xml_utils module documentation for more information on working with
 XMLTreeFile instances.
 """
-
-import logging, os.path
-from autotest.client.shared import error, xml_utils
-from virttest import libvirt_vm, virsh
+from autotest.client.shared import xml_utils
+from virttest import virsh
 
 
 class LibvirtXMLError(Exception):
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-03 20:26, Marcelo Tosatti wrote:
> On Wed, Oct 03, 2012 at 07:24:48PM +0200, Jan Kiszka wrote:
>> On 2012-10-03 19:16, Anthony Liguori wrote:
>>> Jan Kiszka  writes:
>>>
 On 2012-10-03 17:03, Marcelo Tosatti wrote:
> On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote:
>> Marcelo Tosatti  writes:
>>
>>> Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git.
>>>
>>> From: Jan Kiszka 
>>> 
>>> Upstream is moving towards this mechanism, so start using it in qemu-kvm
>>> already to configure the specific defaults: kvm enabled on, just like
>>> in-kernel irqchips.
>>>
>>> Signed-off-by: Marcelo Tosatti 
>>
>>
>> Reviewed-by: Anthony Liguori 
>>
>> Although it's a little odd to have From: Jan without a SoB...
>
> Agree, Jan can you ACK?

 I wasn't able to join the call yesterday: Is there a removal schedule
 associated with those switches? Also, why pushing things upstream, even
 when only for one release, that have been loudly deprecated for a while
 in qemu-kvm? Some switches are lacking deprecated warnings on the
 console, and -no-kvm is missing completely. I tend to focus on patch 1 &
 5, dropping the rest - based on relevance for production use.
>>>
>>> The distros need to keep these flags to do the switch.
>>
>> Why? Should be documented in commit log.
>>
>>>  I see no point
>>> in deprecating them since they're trivially easy to maintain.
>>
>> Given the level of cr** we already have in the command line, they are
>> kind of noise, yes. But even then, these patches are not consistent as
>> pointed out above.
>>
>> Also, they should not be documented to avoid being spread. That's what
>> we did with other deprecated switches in QEMU.
>>
>> Jan
> 
> Jan,
> 
> You're comments to the patch are:
> 
> - No documentation.

See e.g. how -M is handled in qemu-options.hx.

> - Expiration date.

Anthony said "forever", but I think we should remove all those that
issue deprecation warnings after 1-2 years.

> - Changelog explaining what?? (didnt get that). Perhaps better changelog
>   in general?

I'm still failing to understand who could depend on -no-kvm-irqchip or
-no-kvm-pit. And I don't understand why -no-kvm was not included. Soe
the reasons for include -X should be provided. Also check your patch
subjects again, at least one was wrong.

Jan




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] hw: Add test device for unittests execution

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 10:02, Peter Maydell ha scritto:
> On 4 October 2012 04:49, Lucas Meneghel Rodrigues  wrote:
>> Add a test device which supports the kvmctl ioports,
>> so one can run the KVM unittest suite [1].
>>
>> Usage:
>>
>> qemu -device testdev
>>
>> 1) Removed port 0xf1, since now kvm-unit-tests use
>>serial
>>
>> 2) Removed exit code port 0xf4, since that can be
>>replaced by
>>
>> -device isa-debugexit,iobase=0xf4,access-size=2
>>
>> 3) Removed ram size port 0xd1, since guest memory
>>size can be retrieved from firmware, there's a
>>patch for kvm-unit-tests including an API to
>>retrieve that value.
>>
>> [1] Preliminary versions of this patch were posted
>> to the mailing list about a year ago, I re-read the
>> comments of the thread, and had guidance from
>> Paolo about which ports to remove from the test
>> device.
> 
> General remark: there's no documentation anywhere in
> this patch. I don't necessarily mean user-facing docs,
> but at least a descriptive comment saying what the
> heck this device is and what it does would be helpful.
> 
> 
>>
>> CC: Paolo Bonzini 
>> Signed-off-by: Gerd Hoffmann 
>> Signed-off-by: Avi Kivity 
>> Signed-off-by: Marcelo Tosatti 
>> Signed-off-by: Lucas Meneghel Rodrigues 
>> ---
>>  hw/i386/Makefile.objs |   1 +
>>  hw/testdev.c  | 131 
>> ++
>>  2 files changed, 132 insertions(+)
>>  create mode 100644 hw/testdev.c
>>
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 8c764bb..64d2787 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
>>  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
>> xen_pt_msi.o
>>  obj-y += kvm/
>>  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>> +obj-y += testdev.o
> 
> ...the device is useful even in non-KVM configs, then?
> 
>>  obj-y := $(addprefix ../,$(obj-y))
>> diff --git a/hw/testdev.c b/hw/testdev.c
>> new file mode 100644
>> index 000..44070f2
>> --- /dev/null
>> +++ b/hw/testdev.c
>> @@ -0,0 +1,131 @@
>> +#include 
> 
> This file needs a leading comment with the usual copyright/license/
> description of what the file does.
> 
>> +#include "hw.h"
>> +#include "qdev.h"
>> +#include "isa.h"
>> +
>> +struct testdev {
>> +ISADevice dev;
>> +MemoryRegion iomem;
>> +CharDriverState *chr;
>> +};
>> +
>> +#define TYPE_TESTDEV "testdev"
>> +#define TESTDEV(obj) \
>> + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV)
>> +
>> +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data)
>> +{
>> +struct testdev *dev = opaque;
>> +
>> +qemu_set_irq(isa_get_irq(&dev->dev, addr - 0x2000), !!data);
>> +}
>> +
>> +static uint32 test_device_ioport_data;
>> +
>> +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t 
>> data)
>> +{
>> +test_device_ioport_data = data;
>> +}
>> +
>> +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr)
>> +{
>> +return test_device_ioport_data;
>> +}
>> +
>> +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t 
>> data)
>> +{
>> +target_phys_addr_t len = 4096;
>> +void *a = cpu_physical_memory_map(data & ~0xffful, &len, 0);
>> +
>> +mprotect(a, 4096, PROT_NONE);
>> +mprotect(a, 4096, PROT_READ|PROT_WRITE);
>> +cpu_physical_memory_unmap(a, len, 0, 0);
>> +}
>> +
>> +static char *iomem_buf;
>> +
>> +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr)
>> +{
>> +return iomem_buf[addr];
>> +}
>> +
>> +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr)
>> +{
>> +return *(uint16_t*)(iomem_buf + addr);
>> +}
>> +
>> +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr)
>> +{
>> +return *(uint32_t*)(iomem_buf + addr);
>> +}
>> +
>> +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, 
>> uint32_t val)
>> +{
>> +iomem_buf[addr] = val;
>> +}
>> +
>> +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, 
>> uint32_t val)
>> +{
>> +*(uint16_t*)(iomem_buf + addr) = val;
>> +}
>> +
>> +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, 
>> uint32_t val)
>> +{
>> +*(uint32_t*)(iomem_buf + addr) = val;
>> +}
>> +
>> +static const MemoryRegionOps test_iomem_ops = {
>> +.old_mmio = {
>> +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, },
>> +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, 
>> },
>> +},
>> +.endianness = DEVICE_LITTLE_ENDIAN,
>> +};
>> +
>> +static int init_test_device(ISADevice *isa)
>> +{
>> +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa);
>> +
>> +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev);
>> +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev);
>> +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev);

Re: [Qemu-devel] [PATCH] hw: Add test device for unittests execution

2012-10-04 Thread Peter Maydell

On 4 October 2012 04:49, Lucas Meneghel Rodrigues  wrote:
> Add a test device which supports the kvmctl ioports,
> so one can run the KVM unittest suite [1].
>
> Usage:
>
> qemu -device testdev
>
> 1) Removed port 0xf1, since now kvm-unit-tests use
>serial
>
> 2) Removed exit code port 0xf4, since that can be
>replaced by
>
> -device isa-debugexit,iobase=0xf4,access-size=2
>
> 3) Removed ram size port 0xd1, since guest memory
>size can be retrieved from firmware, there's a
>patch for kvm-unit-tests including an API to
>retrieve that value.
>
> [1] Preliminary versions of this patch were posted
> to the mailing list about a year ago, I re-read the
> comments of the thread, and had guidance from
> Paolo about which ports to remove from the test
> device.

General remark: there's no documentation anywhere in
this patch. I don't necessarily mean user-facing docs,
but at least a descriptive comment saying what the
heck this device is and what it does would be helpful.


>
> CC: Paolo Bonzini 
> Signed-off-by: Gerd Hoffmann 
> Signed-off-by: Avi Kivity 
> Signed-off-by: Marcelo Tosatti 
> Signed-off-by: Lucas Meneghel Rodrigues 
> ---
>  hw/i386/Makefile.objs |   1 +
>  hw/testdev.c  | 131 
> ++
>  2 files changed, 132 insertions(+)
>  create mode 100644 hw/testdev.c
>
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 8c764bb..64d2787 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
>  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
> xen_pt_msi.o
>  obj-y += kvm/
>  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
> +obj-y += testdev.o

...the device is useful even in non-KVM configs, then?

>  obj-y := $(addprefix ../,$(obj-y))
> diff --git a/hw/testdev.c b/hw/testdev.c
> new file mode 100644
> index 000..44070f2
> --- /dev/null
> +++ b/hw/testdev.c
> @@ -0,0 +1,131 @@
> +#include 

This file needs a leading comment with the usual copyright/license/
description of what the file does.

> +#include "hw.h"
> +#include "qdev.h"
> +#include "isa.h"
> +
> +struct testdev {
> +ISADevice dev;
> +MemoryRegion iomem;
> +CharDriverState *chr;
> +};
> +
> +#define TYPE_TESTDEV "testdev"
> +#define TESTDEV(obj) \
> + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV)
> +
> +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data)
> +{
> +struct testdev *dev = opaque;
> +
> +qemu_set_irq(isa_get_irq(&dev->dev, addr - 0x2000), !!data);
> +}
> +
> +static uint32 test_device_ioport_data;
> +
> +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t 
> data)
> +{
> +test_device_ioport_data = data;
> +}
> +
> +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr)
> +{
> +return test_device_ioport_data;
> +}
> +
> +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t 
> data)
> +{
> +target_phys_addr_t len = 4096;
> +void *a = cpu_physical_memory_map(data & ~0xffful, &len, 0);
> +
> +mprotect(a, 4096, PROT_NONE);
> +mprotect(a, 4096, PROT_READ|PROT_WRITE);
> +cpu_physical_memory_unmap(a, len, 0, 0);
> +}
> +
> +static char *iomem_buf;
> +
> +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr)
> +{
> +return iomem_buf[addr];
> +}
> +
> +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr)
> +{
> +return *(uint16_t*)(iomem_buf + addr);
> +}
> +
> +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr)
> +{
> +return *(uint32_t*)(iomem_buf + addr);
> +}
> +
> +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, 
> uint32_t val)
> +{
> +iomem_buf[addr] = val;
> +}
> +
> +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, 
> uint32_t val)
> +{
> +*(uint16_t*)(iomem_buf + addr) = val;
> +}
> +
> +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, 
> uint32_t val)
> +{
> +*(uint32_t*)(iomem_buf + addr) = val;
> +}
> +
> +static const MemoryRegionOps test_iomem_ops = {
> +.old_mmio = {
> +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, },
> +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, 
> },
> +},
> +.endianness = DEVICE_LITTLE_ENDIAN,
> +};
> +
> +static int init_test_device(ISADevice *isa)
> +{
> +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa);
> +
> +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev);
> +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev);
> +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev);
> +register_ioport_write(0xe0, 1, 2, test_device_ioport_write, dev);
> +register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev);
> +register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev);
> +regi

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Anthony Liguori  writes:
>> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
>> ever going to be merged, so I'm not sure what its status is?  But I'm
>> determined to fix qemu, and hence my torture patch to make sure this
>> doesn't creep in again.
>
> There are even more implementations out there and I'd wager they all
> rely on framing.

Worse, both virtio_blk (for scsi commands) and virtio_scsi explicitly
and inescapably rely on framing.  The spec conflicts clearly with
itself.

Such layering violations are always a mistake, but I can't blame anyone
else for my lack of attention :(

Here's the spec change:
commit 7e74459bb966ccbaad9e4bf361d1178b7f400b79
Author: Rusty Russell 
Date:   Thu Oct 4 17:11:27 2012 +0930

No longer assume framing is independent of messages.  *sniff*

Signed-off-by: Rusty Russell 

--- virtio-spec.txt 2012-10-04 17:13:04.988259234 +0930
+++ virtio-spec.txt.new 2012-10-04 17:12:54.624258969 +0930
@@ -880,19 +880,19 @@
 
   Message Framing
 
-The descriptors used for a buffer should not effect the semantics
-of the message, except for the total length of the buffer. For
-example, a network buffer consists of a 10 byte header followed
-by the network packet. Whether this is presented in the ring
-descriptor chain as (say) a 10 byte buffer and a 1514 byte
-buffer, or a single 1524 byte buffer, or even three buffers,
-should have no effect.
+Unless stated otherwise, it is expected that headers within a
+message are contained within their own descriptors. For example,
+a network buffer consists of a 10 or 12 byte header followed by
+the network packet. An implementation should expect that this
+header will be within the first descriptor, and that the
+remainder of the data will begin on the second descriptor.
 
-In particular, no implementation should use the descriptor
-boundaries to determine the size of any header in a request.[footnote:
-The current qemu device implementations mistakenly insist that
-the first descriptor cover the header in these cases exactly, so
-a cautious driver should arrange it so.
+[footnote:
+It was previously asserted that framing should be independent of
+message contents, yet invariably drivers layed out messages in
+reliable ways and devices assumed it. In addition, the
+specifications for virtio_blk and virtio_scsi require intuiting
+field lengths from frame boundaries.
 ]
 
   Device Improvements

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined

2012-10-04 Thread Bhushan Bharat-R65777



> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, September 24, 2012 9:09 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined
> 
> 
> On 21.08.2012, at 15:51, Bharat Bhushan wrote:
> 
> > This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
> > ioctl support. Follow up patches will use this for setting up hardware
> > breakpoints, watchpoints and software breakpoints.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > arch/powerpc/include/asm/kvm.h |   33 +
> > arch/powerpc/kvm/book3s.c  |6 ++
> > arch/powerpc/kvm/booke.c   |6 ++
> > arch/powerpc/kvm/powerpc.c |6 --
> > 4 files changed, 45 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm.h
> > b/arch/powerpc/include/asm/kvm.h index 3c14202..61b197e 100644
> > --- a/arch/powerpc/include/asm/kvm.h
> > +++ b/arch/powerpc/include/asm/kvm.h
> > @@ -269,8 +269,41 @@ struct kvm_debug_exit_arch {
> >
> > /* for KVM_SET_GUEST_DEBUG */
> > struct kvm_guest_debug_arch {
> > +   struct {
> > +   /* H/W breakpoint/watchpoint address */
> > +   __u64 addr;
> > +   /*
> > +* Type denotes h/w breakpoint, read watchpoint, write
> > +* watchpoint or watchpoint (both read and write).
> > +*/
> > +#define KVMPPC_DEBUG_NOTYPE0x0
> > +#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
> > +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
> > +#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
> > +   __u32 type;
> > +   __u32 pad1;
> 
> Why the padding?
> 
> > +   __u64 pad2;
> > +   } bp[16];
> 
> Why 16?
> 
> > };
> >
> > +/* Debug related defines */
> > +/*
> > + * kvm_guest_debug->control is a 32 bit field. The lower 16 bits are
> > +generic
> > + * and upper 16 bits are architecture specific. Architecture specific
> > +defines
> > + * that ioctl is for setting hardware breakpoint or software breakpoint.
> > + */
> > +#define KVM_GUESTDBG_USE_SW_BP 0x0001
> > +#define KVM_GUESTDBG_USE_HW_BP 0x0002
> > +
> > +/* When setting software breakpoint, Change the software breakpoint
> > + * instruction to special trap instruction and set
> > +KVM_GUESTDBG_USE_SW_BP
> > + * flag in kvm_guest_debug->control. KVM does keep track of software
> > + * breakpoints. So when KVM_GUESTDBG_USE_SW_BP flag is set and
> > +special trap
> > + * instruction is executed by guest then exit to userspace.
> > + * NOTE: A Nice interface can be added to get the special trap instruction.
> > + */
> > +#define KVMPPC_INST_GUEST_GDB  0x7C00021C  /* ehpriv OC=0 
> > */
> 
> This definitely has to be passed to user space (which writes that instruction
> into guest phys memory). Other PPC subarchs will use different instructions.
> Just model it as a read-only ONE_REG.

Ok.

Thanks
-Bharat


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Gleb Natapov

On Wed, Oct 03, 2012 at 04:56:57PM +0200, Avi Kivity wrote:
> On 10/03/2012 04:17 PM, Raghavendra K T wrote:
> > * Avi Kivity  [2012-09-30 13:13:09]:
> > 
> >> On 09/30/2012 01:07 PM, Gleb Natapov wrote:
> >> > On Sun, Sep 30, 2012 at 10:18:17AM +0200, Avi Kivity wrote:
> >> >> On 09/28/2012 08:16 AM, Raghavendra K T wrote:
> >> >> > 
> >> >> >>
> >> >> >> +struct pv_sched_info {
> >> >> >> +   unsigned long   sched_bitmap;
> >> >> > 
> >> >> > Thinking, whether we need something similar to cpumask here?
> >> >> > Only thing is we are representing guest (v)cpumask.
> >> >> > 
> >> >> 
> >> >> DECLARE_BITMAP(sched_bitmap, KVM_MAX_VCPUS)
> >> >> 
> >> > vcpu_id can be greater than KVM_MAX_VCPUS.
> >> 
> >> Use the index into the vcpu table as the bitmap index then.  In fact
> >> it's better because then the lookup to get the vcpu pointer is trivial.
> > 
> > Did you mean, while setting the bitmap,
> > 
> > we should do 
> > for (i = 1..n)
> > if (kvm->vcpus[i] == vcpu) set ith position in bitmap?
> 
> You can store i in the vcpu itself:
> 
>   set_bit(vcpu->index, &kvm->preempted);
> 
This will make the fact that vcpus are stored in an array into API
instead of implementation detail :( There were patches for vcpu
destruction that changed the array to rculist. Well, it will be still
possible to make the array rcu protected and copy it every time vcpu is
deleted/added I guess.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] hw: Add test device for unittests execution

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 05:49, Lucas Meneghel Rodrigues ha scritto:
> Add a test device which supports the kvmctl ioports,
> so one can run the KVM unittest suite [1].
> 
> Usage:
> 
> qemu -device testdev
> 
> 1) Removed port 0xf1, since now kvm-unit-tests use
>serial
> 
> 2) Removed exit code port 0xf4, since that can be
>replaced by
> 
> -device isa-debugexit,iobase=0xf4,access-size=2

Can you include this in your series?

> 3) Removed ram size port 0xd1, since guest memory
>size can be retrieved from firmware, there's a
>patch for kvm-unit-tests including an API to
>retrieve that value.
> 
> [1] Preliminary versions of this patch were posted
> to the mailing list about a year ago, I re-read the
> comments of the thread, and had guidance from
> Paolo about which ports to remove from the test
> device.
> 
> CC: Paolo Bonzini 
> Signed-off-by: Gerd Hoffmann 
> Signed-off-by: Avi Kivity 
> Signed-off-by: Marcelo Tosatti 
> Signed-off-by: Lucas Meneghel Rodrigues 
> ---
>  hw/i386/Makefile.objs |   1 +
>  hw/testdev.c  | 131 
> ++
>  2 files changed, 132 insertions(+)
>  create mode 100644 hw/testdev.c
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 8c764bb..64d2787 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
>  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
> xen_pt_msi.o
>  obj-y += kvm/
>  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
> +obj-y += testdev.o
>  
>  obj-y := $(addprefix ../,$(obj-y))
> diff --git a/hw/testdev.c b/hw/testdev.c
> new file mode 100644
> index 000..44070f2
> --- /dev/null
> +++ b/hw/testdev.c
> @@ -0,0 +1,131 @@
> +#include 
> +#include "hw.h"
> +#include "qdev.h"
> +#include "isa.h"
> +
> +struct testdev {
> +ISADevice dev;
> +MemoryRegion iomem;
> +CharDriverState *chr;

This member is not necessary anymore.

Paolo

> +};
> +
> +#define TYPE_TESTDEV "testdev"
> +#define TESTDEV(obj) \
> + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV)
> +
> +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data)
> +{
> +struct testdev *dev = opaque;
> +
> +qemu_set_irq(isa_get_irq(&dev->dev, addr - 0x2000), !!data);
> +}
> +
> +static uint32 test_device_ioport_data;
> +
> +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t 
> data)
> +{
> +test_device_ioport_data = data;
> +}
> +
> +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr)
> +{
> +return test_device_ioport_data;
> +}
> +
> +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t 
> data)
> +{
> +target_phys_addr_t len = 4096;
> +void *a = cpu_physical_memory_map(data & ~0xffful, &len, 0);
> +
> +mprotect(a, 4096, PROT_NONE);
> +mprotect(a, 4096, PROT_READ|PROT_WRITE);
> +cpu_physical_memory_unmap(a, len, 0, 0);
> +}
> +
> +static char *iomem_buf;
> +
> +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr)
> +{
> +return iomem_buf[addr];
> +}
> +
> +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr)
> +{
> +return *(uint16_t*)(iomem_buf + addr);
> +}
> +
> +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr)
> +{
> +return *(uint32_t*)(iomem_buf + addr);
> +}
> +
> +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, 
> uint32_t val)
> +{
> +iomem_buf[addr] = val;
> +}
> +
> +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, 
> uint32_t val)
> +{
> +*(uint16_t*)(iomem_buf + addr) = val;
> +}
> +
> +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, 
> uint32_t val)
> +{
> +*(uint32_t*)(iomem_buf + addr) = val;
> +}
> +
> +static const MemoryRegionOps test_iomem_ops = {
> +.old_mmio = {
> +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, },
> +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, 
> },
> +},
> +.endianness = DEVICE_LITTLE_ENDIAN,
> +};
> +
> +static int init_test_device(ISADevice *isa)
> +{
> +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa);
> +
> +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev);
> +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev);
> +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev);
> +register_ioport_write(0xe0, 1, 2, test_device_ioport_write, dev);
> +register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev);
> +register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev);
> +register_ioport_write(0xe4, 1, 4, test_device_flush_page, dev);
> +register_ioport_write(0x2000, 24, 1, test_device_irq_line, NULL);
> +iomem_buf = g_malloc0(0x1);
> +memory_region_init_io(&dev->iomem, &test_iomem_ops, dev,
> +  "testdev", 0x1

RE: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined

2012-10-04 Thread Bhushan Bharat-R65777



> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, September 24, 2012 9:09 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined
> 
> 
> On 21.08.2012, at 15:51, Bharat Bhushan wrote:
> 
> > This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
> > ioctl support. Follow up patches will use this for setting up hardware
> > breakpoints, watchpoints and software breakpoints.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > arch/powerpc/include/asm/kvm.h |   33 +
> > arch/powerpc/kvm/book3s.c  |6 ++
> > arch/powerpc/kvm/booke.c   |6 ++
> > arch/powerpc/kvm/powerpc.c |6 --
> > 4 files changed, 45 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm.h
> > b/arch/powerpc/include/asm/kvm.h index 3c14202..61b197e 100644
> > --- a/arch/powerpc/include/asm/kvm.h
> > +++ b/arch/powerpc/include/asm/kvm.h
> > @@ -269,8 +269,41 @@ struct kvm_debug_exit_arch {
> >
> > /* for KVM_SET_GUEST_DEBUG */
> > struct kvm_guest_debug_arch {
> > +   struct {
> > +   /* H/W breakpoint/watchpoint address */
> > +   __u64 addr;
> > +   /*
> > +* Type denotes h/w breakpoint, read watchpoint, write
> > +* watchpoint or watchpoint (both read and write).
> > +*/
> > +#define KVMPPC_DEBUG_NOTYPE0x0
> > +#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
> > +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
> > +#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
> > +   __u32 type;
> > +   __u32 pad1;
> 
> Why the padding?

Not sure why, I will remove this.

> 
> > +   __u64 pad2;
> > +   } bp[16];
> 
> Why 16?

I think for now 6 (4 iac + 2 dac) is sufficient for BOOKE. We kept 16 to have 
some room for future and other platforms.

Thanks
-Bharat
> 
> > };
> >
> > +/* Debug related defines */
> > +/*
> > + * kvm_guest_debug->control is a 32 bit field. The lower 16 bits are
> > +generic
> > + * and upper 16 bits are architecture specific. Architecture specific
> > +defines
> > + * that ioctl is for setting hardware breakpoint or software breakpoint.
> > + */
> > +#define KVM_GUESTDBG_USE_SW_BP 0x0001
> > +#define KVM_GUESTDBG_USE_HW_BP 0x0002
> > +
> > +/* When setting software breakpoint, Change the software breakpoint
> > + * instruction to special trap instruction and set
> > +KVM_GUESTDBG_USE_SW_BP
> > + * flag in kvm_guest_debug->control. KVM does keep track of software
> > + * breakpoints. So when KVM_GUESTDBG_USE_SW_BP flag is set and
> > +special trap
> > + * instruction is executed by guest then exit to userspace.
> > + * NOTE: A Nice interface can be added to get the special trap instruction.
> > + */
> > +#define KVMPPC_INST_GUEST_GDB  0x7C00021C  /* ehpriv OC=0 
> > */
> 
> This definitely has to be passed to user space (which writes that instruction
> into guest phys memory). Other PPC subarchs will use different instructions.
> Just model it as a read-only ONE_REG.
> 
> 
> Alex
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 02:11, Rusty Russell ha scritto:
> > > There's a reason I haven't done this.  I really, really dislike "my
> > > implemention isn't broken" feature bits.  We could have an infinite
> > > number of them, for each bug in each device.
> >
> > However, this bug affects (almost) all implementations and (almost) all
> > devices.  It even makes sense to reserve a transport feature bit for it
> > instead of a device feature bit.
>
> Perhaps, but we have to fix the bugs first!

Yes. :)  Isn't that what mst's patch does?

> As I said, my torture patch broke qemu immediately.  Since noone has
> leapt onto fixing that, I'll take a look now...

I can look at virtio-scsi.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

82 matches

Mail list logo