Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/20/2011 05:31 PM, Anthony Liguori wrote:

Several alpha system chips MCE when accessed with incorrect sizes.
E.g. only 64-bit accesses are allowed.



But is this a characteristic of devices or is this a characteristic of 
the chipset/CPU?


The chipset is modelled by a MemoryRegion too.



At any rate, I'm fairly sure it doesn't belong in the MemoryRegion 
structure.




Since it isn't a global property, where does it belong?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/20/2011 05:46 PM, Anthony Liguori wrote:

On 05/20/2011 09:40 AM, Richard Henderson wrote:

On 05/20/2011 07:31 AM, Anthony Liguori wrote:
But is this a characteristic of devices or is this a characteristic 
of the chipset/CPU?


Chipset.


So if the chipset only allows accesses that are 64-bit, then you'll 
want to have hierarchical dispatch filter non 64-bit accesses and 
raise an MCE appropriately.


So you don't need anything in MemoryRegion, you need code in the 
dispatch path.


MemoryRegion *is* the dispatch path.  Only done declaratively so we can 
flatten it whenever it changes.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/20/2011 09:16 PM, Blue Swirl wrote:

On Fri, May 20, 2011 at 5:46 PM, Anthony Liguorianth...@codemonkey.ws  wrote:
  On 05/20/2011 09:40 AM, Richard Henderson wrote:

  On 05/20/2011 07:31 AM, Anthony Liguori wrote:

  But is this a characteristic of devices or is this a characteristic of
  the chipset/CPU?

  Chipset.

  So if the chipset only allows accesses that are 64-bit, then you'll want to
  have hierarchical dispatch filter non 64-bit accesses and raise an MCE
  appropriately.

  So you don't need anything in MemoryRegion, you need code in the dispatch
  path.

Sparc (32/64) systems are also very picky about wrong sized accesses.
I think the buses have lines for access size and the device can (and
they will) signal an error in these cases. Then the bus controller
will raise an NMI.

I think the easiest way to handle this could be to use overlapping
registrations for specific sizes. Then we could make a default error
generator device, which would just signal NMI/MCE on any access. It
would register for all of the picky area with lowest possible
priority. Other devices would register the small working access areas
with no knowledge about this error generator device. Any correct
access should go to other devices, bad accesses to the error
generator.

Though this would not be very different from current unassigned access handling.


The MemoryRegion way of doing it would be to register the MemoryRegion 
of the bus with the picky attributes.  Any sub-regions will inherit the 
property.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VM terminates when doing a live migration

2011-05-22 Thread Daniel Bareiro
Hi, Michael.

On Saturday, 21 May 2011 23:39:27 -0400,
Michael Stroucken wrote:

 I'm doing some testing with KVM Live Migration. SS01 (VMHost) has
 Debian GNU/Linux 6.0.1 and Defiant (VMHost) has Debian GNU/Linux
 5.0.8. Defiant has Linux 2.6.32-15~bpo50+1 and 0.12.5+dfsg-3~bpo50+2,
 and SS01 has Linux 2.6.32-31 and 0.12.5+dfsg-5+squeeze1. Both
 instalation are 32-bit, but the kernel in SS01 is amd64 and the
 kernel in Defiant is 686.

 Migration from Defiant to SS01 is completed successfully, but when
 trying to migrate a VM from SS01 to Defiant, the VM terminates
 (segfault?). This is due to the difference in the kernel of each
 VMHost?

 Can you try saving the vms state to a file on ss01 and restoring it on  
 defiant?

I tried following your suggestion but same thing happens:

ss01:~# kvm -m 256 -boot d -net nic,vlan=0,macaddr=52:54:67:92:9d:63 \
 -net tap -daemonize -vnc :15 -k es -localtime -cdrom \
 /mnt/systemrescuecd-x86-2.0.1.iso -monitor telnet:localhost:4055,server,nowait

ss01:~# telnet localhost 4055
Trying ::1...
Connected to localhost.
Escape character is '^]'.
QEMU 0.12.5 monitor - type 'help' for more information
(qemu) stop
(qemu) migrate_set_speed 4095m
(qemu) migrate exec:gzip -c  STATEFILE.gz
Connection closed by foreign host.

ss01:~# ps ax|grep systemrescuecd
26564 pts/0S+ 0:00 grep systemrescuecd


It appears that the 'migrate' command causes termination of the virtual
machine to these parameters as well.

 Do migrations between ss01 and another amd64 box work?

I have not tried a migration where both source and destination have the
same characteristics (i386 installation and amd64 kernel).


Thanks for your reply.

Regards,
Daniel
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598


signature.asc
Description: Digital signature


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/20/2011 08:59 PM, Blue Swirl wrote:

On Thu, May 19, 2011 at 5:12 PM, Avi Kivitya...@redhat.com  wrote:
  The memory API separates the attributes of a memory region (its size, how
  reads or writes are handled, dirty logging, and coalescing) from where it
  is mapped and whether it is enabled.  This allows a device to configure
  a memory region once, then hand it off to its parent bus to map it according
  to the bus configuration.

  Hierarchical registration also allows a device to compose a region out of
  a number of sub-regions with different properties; for example some may be
  RAM while others may be MMIO.



  +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
  +/* Enable memory coalescing for the region.  MMIO -write callbacks may be
  + * delayed until a non-coalesced MMIO is issued.
  + */
  +void memory_region_set_coalescing(MemoryRegion *mr);
  +/* Enable memory coalescing for a sub-range of the region.  MMIO -write
  + * callbacks may be delayed until a non-coalesced MMIO is issued.
  + */
  +void memory_region_add_coalescing(MemoryRegion *mr,
  +  target_phys_addr_t offset,
  +  target_phys_addr_t size);
  +/* Disable MMIO coalescing for the region. */
  +void memory_region_clear_coalescing(MemoryRegion *mr);

Perhaps the interface could be more generic, like
+void memory_region_set_property(MemoryRegion *mr, unsigned flags);
+void memory_region_clear_property(MemoryRegion *mr, unsigned flags);



Coalescing is a complex property, not just a boolean attribute.  We 
probably will have a number of boolean attributes later, though.



  + * conflicts are resolved by having a higher @priority hide a lower 
@priority.
  + * Subregions without priority are taken as @priority 0.
  + */
  +void memory_region_add_subregion_overlap(MemoryRegion *mr,
  + target_phys_addr_t offset,
  + MemoryRegion *subregion,
  + unsigned priority);
  +/* Remove a subregion. */
  +void memory_region_del_subregion(MemoryRegion *mr,
  + MemoryRegion *subregion);

What would the subregions be used for?


Subregions describe the flow of data through the memory bus.  We'd have 
a subregion for the PCI bus, with its own subregions for various BARs, 
with some having subregions for dispatching different MMIO types within 
the BAR.


This allows, for example, the PCI layer to move a BAR without the PCI 
device knowing anything about it.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Enable CPU SMEP feature for KVM

2011-05-22 Thread Avi Kivity

On 05/22/2011 08:23 AM, Yang, Wei Y wrote:

This patch matches with [PATCH v2] Enable CPU SMEP feature support for 
QEMU-KVM, no changes since v1.

Enable newly documented SMEP (Supervisor Mode Execution Protection) CPU feature 
in KVM module.

Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP 
prevents kernel from executing code in application. Updated Intel SDM describes 
this CPU feature. The document will be published soon.

This patch is based on Fenghua's SMEP patch series, as referred by: 
https://lkml.org/lkml/2011/5/17/523
This patch enables guests' usage of SMEP.
Currently, we don't enable this feature for guests with shadow page tables.


Why not?  I see nothing that conflicts with shadow.

Missing:
  update kvm_set_cr4() to reject SMEP if it's disabled in cpuid
  drop SMEP from cr4_guest_owned_bits if SMEP is disabled in cpuid
  update walk_addr_generic() to fault if SMEP is enabled and fetching 
from a user page


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/20/2011 05:06 PM, Richard Henderson wrote:

Is this structure honestly any better than 4 function pointers?
I can't see that it is, myself.



That was requested by Anthony.  And in fact we have two bits of 
information per access size, one is whether the access is allowed or 
not, the other is whether we want to use a larger or smaller access size 
function (useful for auto-splitting a 64-bit access into two 32-bit 
accesses for example).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/20/2011 08:59 PM, Blue Swirl wrote:

On Thu, May 19, 2011 at 5:12 PM, Avi Kivitya...@redhat.com  wrote:
  The memory API separates the attributes of a memory region (its size, how
  reads or writes are handled, dirty logging, and coalescing) from where it
  is mapped and whether it is enabled.  This allows a device to configure
  a memory region once, then hand it off to its parent bus to map it according
  to the bus configuration.

  Hierarchical registration also allows a device to compose a region out of
  a number of sub-regions with different properties; for example some may be
  RAM while others may be MMIO.

  +/* Destroy a memory region.  The memory becomes inaccessible. */
  +void memory_region_destroy(MemoryRegion *mr);

Doesn't the lower priority region become accessible instead in some cases?


If we require that _add_subregion() and _del_subregion() be paired, then no.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2

2011-05-22 Thread Nadav Har'El
Hi,

On Sun, May 22, 2011, Tian, Kevin wrote about RE: [PATCH 07/31] nVMX: 
Introduce vmcs02: VMCS used to run L2:
 Here the vmcs02 being overridden may have been run on another processor before
 but is not vmclear-ed yet. When you resume this vmcs02 with new content on a 
 separate processor, the risk of corruption exists.

I still believe that my current code is correct (in this area). I'll try to
explain it here and would be grateful if you could point to me the error (if
there is one) in my logic:

Nested_vmx_run() is our function which is switches from running L1 to L2
(patch 18).

This function starts by calling nested_get_current_vmcs02(), which gets us
*some* vmcs to use for vmcs02. This may be a fresh new VMCS, or a recycled
VMCS, some VMCS we've previously used to run some, potentially different L2
guest on some, potentially different, CPU.
nested_get_current_vmcs02() returns a saved_vmcs structure, which
not only contains a VMCS, but also remembers on which (if any) cpu it is
currently loaded (and whether it was VMLAUNCHed once on that cpu).

The next thing that Nested_vmx_run() now does is to set up in the vcpu object
the vmcs, cpu and launched fields according to what was returned above.

Now it calls vmx_vcpu_load(). This standard KVM function checks if we're now
running on a different CPU from the vcpu-cpu, and if it a different one, is
uses vcpu_clear() to VMCLEAR the vmcs on the CPU where it was last loaded
(using an IPI). Only after it vmclears the VMCS on the old CPU, it can finally
load the VMCS on the new CPU.

Only now Nested_vmx_run() can call prepare_vmcs02, which starts VMWRITEing
to this VMCS, and finally returns.

P.S. Seeing that you're from Intel, maybe you can help me with a pointer:
I found what appears to be a small error in the SDM - who can I report it to?

Thanks,
Nadav.

-- 
Nadav Har'El|   Sunday, May 22 2011, 18 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |I work for money. If you want loyalty,
http://nadav.harel.org.il   |buy yourself a dog.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] Enable CPU SMEP feature for KVM

2011-05-22 Thread Yang, Wei Y
 This patch matches with [PATCH v2] Enable CPU SMEP feature support for 
 QEMU-KVM, no changes since v1.

 Enable newly documented SMEP (Supervisor Mode Execution Protection) CPU 
 feature in KVM module.

 Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP 
 prevents kernel from executing code in application. Updated Intel SDM 
 describes this CPU feature. The document will be published soon.

 This patch is based on Fenghua's SMEP patch series, as referred by: 
 https://lkml.org/lkml/2011/5/17/523
 This patch enables guests' usage of SMEP.
 Currently, we don't enable this feature for guests with shadow page tables.

 Why not?  I see nothing that conflicts with shadow.

We don't need to enable it for shadow page table, because shadow has mask 
against guest/shadow PTE, which may cause problem.  Let's keep shadow as it is 
because it's already very complex. Assume SMEP machines should have EPT.

 Missing:
   update kvm_set_cr4() to reject SMEP if it's disabled in cupid

Yes, I will check it.

   drop SMEP from cr4_guest_owned_bits if SMEP is disabled in cupid

SMEP BIT is not included in KVM_CR4_GUEST_OWNED_BITS.

   update walk_addr_generic() to fault if SMEP is enabled and fetching 

Comments above.

 from a user page

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Enable CPU SMEP feature for KVM

2011-05-22 Thread Avi Kivity

On 05/22/2011 11:08 AM, Yang, Wei Y wrote:

  This patch matches with [PATCH v2] Enable CPU SMEP feature support for 
QEMU-KVM, no changes since v1.

  Enable newly documented SMEP (Supervisor Mode Execution Protection) CPU 
feature in KVM module.

  Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP 
prevents kernel from executing code in application. Updated Intel SDM describes 
this CPU feature. The document will be published soon.

  This patch is based on Fenghua's SMEP patch series, as referred by: 
https://lkml.org/lkml/2011/5/17/523
  This patch enables guests' usage of SMEP.
  Currently, we don't enable this feature for guests with shadow page tables.

  Why not?  I see nothing that conflicts with shadow.

We don't need to enable it for shadow page table, because shadow has mask 
against guest/shadow PTE, which may cause problem.  Let's keep shadow as it is 
because it's already very complex. Assume SMEP machines should have EPT.



I don't understand why.  Can you elaborate?

Shadow implements the U bit, which is all that is needed by SMEP as far 
as I can tell.




update walk_addr_generic() to fault if SMEP is enabled and fetching

Comments above.

  from a user page



Needs to be done even from EPT, in case walk_addr_generic() is invoked 
by the emulator.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2

2011-05-22 Thread Nadav Har'El
Hi,

On Fri, May 20, 2011, Tian, Kevin wrote about RE: [PATCH 07/31] nVMX: 
Introduce vmcs02: VMCS used to run L2:
 Possibly we can maintain the vmcs02 pool along with L1 VMCLEAR ops, which
 is similar to the hardware behavior regarding to cleared and launched state.

If you set VMCS02_POOL_SIZE to a large size, and L1, like typical hypervisors,
only keeps around a few VMCSs (and VMCLEARs the ones it will not use again),
then we'll only have a few vmcs02: handle_vmclear() removes from the pool the
vmcs02 that L1 explicitly told us it won't need again.

  +struct saved_vmcs {
  +   struct vmcs *vmcs;
  +   int cpu;
  +   int launched;
  +};
 
 saved looks a bit misleading here. It's simply a list of all active vmcs02 
 tracked
 by kvm, isn't it?

I have rewritten this part of the code, based on Avi's and Marcelo's requests,
and the new name for this structure is loaded_vmcs, i.e., a structure
describing where a VMCS was loaded.


  +/* Used to remember the last vmcs02 used for some recently used vmcs12s
  */
  +struct vmcs02_list {
  +   struct list_head list;
  +   gpa_t vmcs12_addr;
 
 uniform the name 'vmptr' as nested_vmx strucure:

Ok. Changing all the mentions of vmcs12_addr to vmptr.

-- 
Nadav Har'El|   Sunday, May 22 2011, 18 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |A witty saying proves nothing. --
http://nadav.harel.org.il   |Voltaire
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-22 Thread Nadav Har'El
On Wed, May 18, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 Humpf, right. OK, you can handle the x86.c usage with
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
...

Hi Avi and Marcelo, here is a the new first patch to the nvmx patch set,
which overhauls the handling of vmcss on cpus, as you asked.

As you guessed, the nested entry and exit code becomes much simpler and
cleaner, with the whole VMCS switching code on entry, for example, reduced
to:
cpu = get_cpu();
vmx-loaded_vmcs = vmcs02;
vmx_vcpu_put(vcpu);
vmx_vcpu_load(vcpu, cpu);
vcpu-cpu = cpu;
put_cpu();

You can apply this patch separately from the rest of the patch set, if you
wish. I'm sending just this one, like you asked - and can send the rest of
the patches when you ask me to.


Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus.

In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx-loaded_vmcs is a pointer to the
currently active VMCS.

Signed-off-by: Nadav Har'El n...@il.ibm.com
---
 arch/x86/kvm/vmx.c |  129 ++-
 arch/x86/kvm/x86.c |3 -
 2 files changed, 80 insertions(+), 52 deletions(-)

--- .before/arch/x86/kvm/x86.c  2011-05-22 11:41:57.0 +0300
+++ .after/arch/x86/kvm/x86.c   2011-05-22 11:41:57.0 +0300
@@ -2119,7 +2119,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu 
if (need_emulate_wbinvd(vcpu)) {
if (kvm_x86_ops-has_wbinvd_exit())
cpumask_set_cpu(cpu, vcpu-arch.wbinvd_dirty_mask);
-   else if (vcpu-cpu != -1  vcpu-cpu != cpu)
+   else if (vcpu-cpu != -1  vcpu-cpu != cpu
+cpu_online(vcpu-cpu))
smp_call_function_single(vcpu-cpu,
wbinvd_ipi, NULL, 1);
}
--- .before/arch/x86/kvm/vmx.c  2011-05-22 11:41:57.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-05-22 11:41:58.0 +0300
@@ -116,6 +116,18 @@ struct vmcs {
char data[0];
 };
 
+/*
+ * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also
+ * remember whether it was VMLAUNCHed, and maintain a linked list of all VMCSs
+ * loaded on this CPU (so we can clear them if the CPU goes down).
+ */
+struct loaded_vmcs {
+   struct vmcs *vmcs;
+   int cpu;
+   int launched;
+   struct list_head loaded_vmcss_on_cpu_link;
+};
+
 struct shared_msr_entry {
unsigned index;
u64 data;
@@ -124,9 +136,7 @@ struct shared_msr_entry {
 
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
-   struct list_head  local_vcpus_link;
unsigned long host_rsp;
-   int   launched;
u8fail;
u8cpl;
bool  nmi_known_unmasked;
@@ -140,7 +150,14 @@ struct vcpu_vmx {
u64   msr_host_kernel_gs_base;
u64   msr_guest_kernel_gs_base;
 #endif
-   struct vmcs  *vmcs;
+   /*
+* loaded_vmcs points to the VMCS currently used in this vcpu. For a
+* non-nested (L1) guest, it always points to vmcs01. For a nested
+* guest (L2), it points to a different VMCS.
+*/
+   struct loaded_vmcsvmcs01;
+   struct loaded_vmcs   *loaded_vmcs;
+   bool  __launched; /* temporary, used in vmx_vcpu_run */
struct msr_autoload {
unsigned nr;
struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS];
@@ -200,7 +217,11 @@ static int vmx_set_tss_addr(struct kvm *
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
-static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
+/*
+ * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed
+ * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
+ */
+static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
 static unsigned long *vmx_io_bitmap_a;
@@ -514,25 +535,25 @@ static void 

Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Blue Swirl
On Sun, May 22, 2011 at 9:45 AM, Avi Kivity a...@redhat.com wrote:
 On 05/20/2011 08:59 PM, Blue Swirl wrote:

 On Thu, May 19, 2011 at 5:12 PM, Avi Kivitya...@redhat.com  wrote:
   The memory API separates the attributes of a memory region (its size,
  how
   reads or writes are handled, dirty logging, and coalescing) from where
  it
   is mapped and whether it is enabled.  This allows a device to configure
   a memory region once, then hand it off to its parent bus to map it
  according
   to the bus configuration.
 
   Hierarchical registration also allows a device to compose a region out
  of
   a number of sub-regions with different properties; for example some may
  be
   RAM while others may be MMIO.

   +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned
  client);
   +/* Enable memory coalescing for the region.  MMIO -write callbacks
  may be
   + * delayed until a non-coalesced MMIO is issued.
   + */
   +void memory_region_set_coalescing(MemoryRegion *mr);
   +/* Enable memory coalescing for a sub-range of the region.  MMIO
  -write
   + * callbacks may be delayed until a non-coalesced MMIO is issued.
   + */
   +void memory_region_add_coalescing(MemoryRegion *mr,
   +                                  target_phys_addr_t offset,
   +                                  target_phys_addr_t size);
   +/* Disable MMIO coalescing for the region. */
   +void memory_region_clear_coalescing(MemoryRegion *mr);

 Perhaps the interface could be more generic, like
 +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
 +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);


 Coalescing is a complex property, not just a boolean attribute.  We probably
 will have a number of boolean attributes later, though.

But what is the difference between adding coalescing to an area and
setting the bit property 'coalescing' to an area? At least what you
propose now is not so complex that it couldn't be handled as a single
bit.

   + * conflicts are resolved by having a higher @priority hide a lower
  @priority.
   + * Subregions without priority are taken as @priority 0.
   + */
   +void memory_region_add_subregion_overlap(MemoryRegion *mr,
   +                                         target_phys_addr_t offset,
   +                                         MemoryRegion *subregion,
   +                                         unsigned priority);
   +/* Remove a subregion. */
   +void memory_region_del_subregion(MemoryRegion *mr,
   +                                 MemoryRegion *subregion);

 What would the subregions be used for?

 Subregions describe the flow of data through the memory bus.  We'd have a
 subregion for the PCI bus, with its own subregions for various BARs, with
 some having subregions for dispatching different MMIO types within the BAR.

 This allows, for example, the PCI layer to move a BAR without the PCI device
 knowing anything about it.

But why can't a first class region be used for that?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/10] qemu-kvm: Use upstream kvm_cpu_exec

2011-05-22 Thread Jan Kiszka
On 2011-05-20 19:17, Christoph Hellwig wrote:
 On Fri, May 20, 2011 at 07:12:39PM +0200, Jan Kiszka wrote:
 Upstream's and qemu-kvm's kvm_cpu_exec are not logically equivalent so
 
 s/not/now/?

Oops, of course.

If there is no other need to repost, this should be fixed on merge.

Thanks,
Jan



signature.asc
Description: OpenPGP digital signature


Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving

2011-05-22 Thread Ingo Molnar

* tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote:

 diff --git a/tools/perf/feature-tests.mak b/tools/kvm/config/feature-tests.mak
 similarity index 83%
 copy from tools/perf/feature-tests.mak
 copy to tools/kvm/config/feature-tests.mak

Btw, now that we have feature-tests.mak it would be nice to populate the checks 
for the various assumptions.

One i recently ran into on a new box where i tried to install tools/kvm was:

In file included from /usr/include/features.h:387:0,
 from /usr/include/unistd.h:26,
 from include/kvm/util.h:12,
 from bios/e820.c:5:
/usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or 
directory
compilation terminated.
make: *** [bios/bios-rom.bin] Error 1

that's a dependency on glibc-dev[el].

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving

2011-05-22 Thread Ingo Molnar

* Ingo Molnar mi...@elte.hu wrote:

 
 * tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote:
 
  diff --git a/tools/perf/feature-tests.mak 
  b/tools/kvm/config/feature-tests.mak
  similarity index 83%
  copy from tools/perf/feature-tests.mak
  copy to tools/kvm/config/feature-tests.mak
 
 Btw, now that we have feature-tests.mak it would be nice to populate the 
 checks 
 for the various assumptions.
 
 One i recently ran into on a new box where i tried to install tools/kvm was:
 
 In file included from /usr/include/features.h:387:0,
  from /usr/include/unistd.h:26,
  from include/kvm/util.h:12,
  from bios/e820.c:5:
 /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or 
 directory
 compilation terminated.
 make: *** [bios/bios-rom.bin] Error 1
 
 that's a dependency on glibc-dev[el].

Another detail: on 64-bit hosts the dependency is on gibc-dev[el].i686, i.e. 
the 32-bit package.

Would it be simple to remove this dependency? It's not typically installed by 
default on distros and it would be nice to make most of the kvm code build by 
default almost everywhere.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving

2011-05-22 Thread Cyrill Gorcunov
On 05/22/2011 02:58 PM, Ingo Molnar wrote:
 
 * tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote:
 
 diff --git a/tools/perf/feature-tests.mak 
 b/tools/kvm/config/feature-tests.mak
 similarity index 83%
 copy from tools/perf/feature-tests.mak
 copy to tools/kvm/config/feature-tests.mak
 
 Btw, now that we have feature-tests.mak it would be nice to populate the 
 checks 
 for the various assumptions.
 
 One i recently ran into on a new box where i tried to install tools/kvm was:
 
 In file included from /usr/include/features.h:387:0,
  from /usr/include/unistd.h:26,
  from include/kvm/util.h:12,
  from bios/e820.c:5:
 /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or 
 directory
 compilation terminated.
 make: *** [bios/bios-rom.bin] Error 1
 
 that's a dependency on glibc-dev[el].
 
 Thanks,
 
   Ingo

  Ouch. I've been hitting this lack of gnu/stubs-32.h on Fedora 15 too and 
eventually
I had to install i686 packages for devel. I'll try to resolve this issue 
tonight, I
suppose it's not absolutely urgent, right?

-- 
Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving

2011-05-22 Thread Cyrill Gorcunov
On 05/22/2011 03:00 PM, Ingo Molnar wrote:
 
 * Ingo Molnar mi...@elte.hu wrote:
 

 * tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote:

 diff --git a/tools/perf/feature-tests.mak 
 b/tools/kvm/config/feature-tests.mak
 similarity index 83%
 copy from tools/perf/feature-tests.mak
 copy to tools/kvm/config/feature-tests.mak

 Btw, now that we have feature-tests.mak it would be nice to populate the 
 checks 
 for the various assumptions.

 One i recently ran into on a new box where i tried to install tools/kvm was:

 In file included from /usr/include/features.h:387:0,
  from /usr/include/unistd.h:26,
  from include/kvm/util.h:12,
  from bios/e820.c:5:
 /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or 
 directory
 compilation terminated.
 make: *** [bios/bios-rom.bin] Error 1

 that's a dependency on glibc-dev[el].
 
 Another detail: on 64-bit hosts the dependency is on gibc-dev[el].i686, i.e. 
 the 32-bit package.
 
 Would it be simple to remove this dependency? It's not typically installed by 
 default on distros and it would be nice to make most of the kvm code build by 
 default almost everywhere.
 
 Thanks,
 
   Ingo

  I'll take a look if we really need it (note we need to compile 16bit code for
bios blob so it might eventually be a problem ;)

-- 
Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/22/2011 12:32 PM, Blue Swirl wrote:

  +void memory_region_add_coalescing(MemoryRegion *mr,
  +  target_phys_addr_t offset,
  +  target_phys_addr_t size);
  +/* Disable MMIO coalescing for the region. */
  +void memory_region_clear_coalescing(MemoryRegion *mr);

  Perhaps the interface could be more generic, like
  +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
  +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);


  Coalescing is a complex property, not just a boolean attribute.  We probably
  will have a number of boolean attributes later, though.

But what is the difference between adding coalescing to an area and
setting the bit property 'coalescing' to an area? At least what you
propose now is not so complex that it couldn't be handled as a single
bit.


Look at the API - add_coalescing() sets the coalescing property on a 
subrange of the memory region, not the entire region.


(motivation - hw/e1000.c).


  + * conflicts are resolved by having a higher @priority hide a lower
@priority.
  + * Subregions without priority are taken as @priority 0.
  + */
  +void memory_region_add_subregion_overlap(MemoryRegion *mr,
  + target_phys_addr_t offset,
  + MemoryRegion *subregion,
  + unsigned priority);
  +/* Remove a subregion. */
  +void memory_region_del_subregion(MemoryRegion *mr,
  + MemoryRegion *subregion);

  What would the subregions be used for?

  Subregions describe the flow of data through the memory bus.  We'd have a
  subregion for the PCI bus, with its own subregions for various BARs, with
  some having subregions for dispatching different MMIO types within the BAR.

  This allows, for example, the PCI layer to move a BAR without the PCI device
  knowing anything about it.

But why can't a first class region be used for that?


Subregions are first-class regions.  In fact all regions are subregions 
except the root.


It's a tree of regions, each level adding an offset, clipping, and 
perhaps other attributes, with the leaves providing actual memory (mmio 
or RAM).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Blue Swirl
On Sun, May 22, 2011 at 2:36 PM, Avi Kivity a...@redhat.com wrote:
 On 05/22/2011 12:32 PM, Blue Swirl wrote:

       +void memory_region_add_coalescing(MemoryRegion *mr,
       +                                  target_phys_addr_t offset,
       +                                  target_phys_addr_t size);
       +/* Disable MMIO coalescing for the region. */
       +void memory_region_clear_coalescing(MemoryRegion *mr);
 
   Perhaps the interface could be more generic, like
   +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
   +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);
 
 
   Coalescing is a complex property, not just a boolean attribute.  We
  probably
   will have a number of boolean attributes later, though.

 But what is the difference between adding coalescing to an area and
 setting the bit property 'coalescing' to an area? At least what you
 propose now is not so complex that it couldn't be handled as a single
 bit.

 Look at the API - add_coalescing() sets the coalescing property on a
 subrange of the memory region, not the entire region.

Right, but doesn't the same apply to any other properties, they may
apply to a full range or just a subrange?

 (motivation - hw/e1000.c).

       + * conflicts are resolved by having a higher @priority hide a
  lower
     @priority.
       + * Subregions without priority are taken as @priority 0.
       + */
       +void memory_region_add_subregion_overlap(MemoryRegion *mr,
       +                                         target_phys_addr_t
  offset,
       +                                         MemoryRegion
  *subregion,
       +                                         unsigned priority);
       +/* Remove a subregion. */
       +void memory_region_del_subregion(MemoryRegion *mr,
       +                                 MemoryRegion *subregion);
 
   What would the subregions be used for?
 
   Subregions describe the flow of data through the memory bus.  We'd have
  a
   subregion for the PCI bus, with its own subregions for various BARs,
  with
   some having subregions for dispatching different MMIO types within the
  BAR.
 
   This allows, for example, the PCI layer to move a BAR without the PCI
  device
   knowing anything about it.

 But why can't a first class region be used for that?

 Subregions are first-class regions.  In fact all regions are subregions
 except the root.

Oh, I see now. Maybe the comments should describe this. Or perhaps the
terms should be something like 'bus/bridge/root' and 'region' instead
of 'region' and 'subregion'?

 It's a tree of regions, each level adding an offset, clipping, and perhaps
 other attributes, with the leaves providing actual memory (mmio or RAM).

I thought that there are two classes of regions, like PCI device vs. a
single BAR.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-22 Thread Michael S. Tsirkin
On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote:
 On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com 
 wrote:
  Current code might introduce a lot of latency variation
  if there are many pending bufs at the time we
  attempt to transmit a new one. This is bad for
  real-time applications and can't be good for TCP either.
 
 Do we have more than speculation to back that up, BTW?

Need to dig this up: I thought we saw some reports of this on the list?

 This patch is pretty sloppy; the previous ones were better polished.
 
  -static void free_old_xmit_skbs(struct virtnet_info *vi)
  +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity)
   {
 
 A comment here indicating it returns true if it frees something?

Agree.

  struct sk_buff *skb;
  unsigned int len;
  -
  -   while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) {
  +   bool c;
  +   int n;
  +
  +   /* We try to free up at least 2 skbs per one sent, so that we'll get
  +* all of the memory back if they are used fast enough. */
  +   for (n = 0;
  +((c = virtqueue_get_capacity(vi-svq)  capacity) || n  2) 
  +((skb = virtqueue_get_buf(vi-svq, len)));
  +++n) {
  pr_debug(Sent skb %p\n, skb);
  vi-dev-stats.tx_bytes += skb-len;
  vi-dev-stats.tx_packets++;
  dev_kfree_skb_any(skb);
  }
  +   return !c;
 
 This is for() abuse :)
 
 Why is the capacity check in there at all?  Surely it's simpler to try
 to free 2 skbs each time around?

This is in case we can't use indirect: we want to free up
enough buffers for the following add_buf to succeed.


for (n = 0; n  2; n++) {
 skb = virtqueue_get_buf(vi-svq, len);
 if (!skb)
 break;
   pr_debug(Sent skb %p\n, skb);
   vi-dev-stats.tx_bytes += skb-len;
   vi-dev-stats.tx_packets++;
   dev_kfree_skb_any(skb);
}
 
   static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
  @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
  struct net_device *dev)
  struct virtnet_info *vi = netdev_priv(dev);
  int capacity;
   
  -   /* Free up any pending old buffers before queueing new ones. */
  -   free_old_xmit_skbs(vi);
  +   /* Free enough pending old buffers to enable queueing new ones. */
  +   free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS);
   
  /* Try to transmit */
  capacity = xmit_skb(vi, skb);
  @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
  struct net_device *dev)
  netif_stop_queue(dev);
  if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) {
  /* More just got used, free them then recheck. */
  -   free_old_xmit_skbs(vi);
  -   capacity = virtqueue_get_capacity(vi-svq);
  -   if (capacity = 2+MAX_SKB_FRAGS) {
  +   if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
 
 This extra argument to free_old_xmit_skbs seems odd, unless you have
 future plans?
 
 Thanks,
 Rusty.

I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
sure we have enough space in the buffer. Another way to do
that is with a define :).

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/22/2011 03:06 PM, Blue Swirl wrote:

On Sun, May 22, 2011 at 2:36 PM, Avi Kivitya...@redhat.com  wrote:
  On 05/22/2011 12:32 PM, Blue Swirl wrote:

+void memory_region_add_coalescing(MemoryRegion *mr,
+  target_phys_addr_t offset,
+  target_phys_addr_t size);
+/* Disable MMIO coalescing for the region. */
+void memory_region_clear_coalescing(MemoryRegion *mr);
  
  Perhaps the interface could be more generic, like
  +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
  +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);
  
  
  Coalescing is a complex property, not just a boolean attribute.  We
probably
  will have a number of boolean attributes later, though.

  But what is the difference between adding coalescing to an area and
  setting the bit property 'coalescing' to an area? At least what you
  propose now is not so complex that it couldn't be handled as a single
  bit.

  Look at the API - add_coalescing() sets the coalescing property on a
  subrange of the memory region, not the entire region.

Right, but doesn't the same apply to any other properties, they may
apply to a full range or just a subrange?


We'll know when we have more properties.  I expect most will be region-wide.




  Subregions are first-class regions.  In fact all regions are subregions
  except the root.

Oh, I see now. Maybe the comments should describe this. Or perhaps the
terms should be something like 'bus/bridge/root' and 'region' instead
of 'region' and 'subregion'?


Problem is, memory_region_add_subregion() adds both sub-bridges and leaf 
regions.


It's quite possible that BAR 0 will be a leaf region, and BAR 1 will be 
a sub-bridge.


Can you suggest an alternative naming for the API?



  It's a tree of regions, each level adding an offset, clipping, and perhaps
  other attributes, with the leaves providing actual memory (mmio or RAM).

I thought that there are two classes of regions, like PCI device vs. a
single BAR.


It's true in a way, except the mapping is not 1:1.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86 emulator: Fix unconditional return from get_descriptor_table_ptr()

2011-05-22 Thread Avi Kivity

On 05/21/2011 07:06 AM, Takuya Yoshikawa wrote:

From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp

A trivial typo was found in the following commit:
   commit 7753ed6043bfce55dc0c407490896632014b677e
   KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks

When the table indicator flag is set, when the selector selects the
current LDT, get_descriptor_table_ptr() returns without setting the
size and address of the table.

Guests will see #GP if this happens.



Thanks, applied.


Signed-off-by: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp
---
  Is this stable material? -- IIRC, someone reported a suspicous
  emulator bug recently.


This was not yet merged upstream, so I folded this into the bad commit, 
and upstream will never see the bug.  It's now 4bff1e86ad286d in kvm.git.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Blue Swirl
On Sun, May 22, 2011 at 3:18 PM, Avi Kivity a...@redhat.com wrote:
 On 05/22/2011 03:06 PM, Blue Swirl wrote:

 On Sun, May 22, 2011 at 2:36 PM, Avi Kivitya...@redhat.com  wrote:
   On 05/22/2011 12:32 PM, Blue Swirl wrote:
 
             +void memory_region_add_coalescing(MemoryRegion *mr,
             +                                  target_phys_addr_t
  offset,
             +                                  target_phys_addr_t
  size);
             +/* Disable MMIO coalescing for the region. */
             +void memory_region_clear_coalescing(MemoryRegion *mr);
   
       Perhaps the interface could be more generic, like
       +void memory_region_set_property(MemoryRegion *mr, unsigned
  flags);
       +void memory_region_clear_property(MemoryRegion *mr, unsigned
  flags);
   
   
       Coalescing is a complex property, not just a boolean attribute.
   We
     probably
       will have a number of boolean attributes later, though.
 
   But what is the difference between adding coalescing to an area and
   setting the bit property 'coalescing' to an area? At least what you
   propose now is not so complex that it couldn't be handled as a single
   bit.
 
   Look at the API - add_coalescing() sets the coalescing property on a
   subrange of the memory region, not the entire region.

 Right, but doesn't the same apply to any other properties, they may
 apply to a full range or just a subrange?

 We'll know when we have more properties.  I expect most will be region-wide.

Since we don't know about those yet, coalescing API could be like you
propose. Later it can be changed to the property API, or leave for
convenience.

 
   Subregions are first-class regions.  In fact all regions are subregions
   except the root.

 Oh, I see now. Maybe the comments should describe this. Or perhaps the
 terms should be something like 'bus/bridge/root' and 'region' instead
 of 'region' and 'subregion'?

 Problem is, memory_region_add_subregion() adds both sub-bridges and leaf
 regions.

 It's quite possible that BAR 0 will be a leaf region, and BAR 1 will be a
 sub-bridge.

 Can you suggest an alternative naming for the API?

How about
memory_region_container_init()
memory_region_add()
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] qemu-kvm-0.14.1

2011-05-22 Thread Avi Kivity
qemu-kvm-0.14.1 is now available. This release is based on the upstream 
qemu 0.14.1, plus kvm-specific enhancements. Please see the original 
qemu 0.14.1 release announcement for details.


This release can be used with the kvm kernel modules provided by your
distribution kernel, or by the modules in the kvm-kmod package, such
as kvm-kmod-2.6.39.  See [1] for details on choosing the right kvm modules.

http://www.linux-kvm.org

[1] http://www.linux-kvm.org/page/Getting_the_kvm_kernel_modules

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/22/2011 06:32 PM, Blue Swirl wrote:


  Can you suggest an alternative naming for the API?

How about
memory_region_container_init()
memory_region_add()


I'm neutral.  If someone seconds this, I'll make it so.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Anthony Liguori

On 05/22/2011 01:38 AM, Avi Kivity wrote:

On 05/20/2011 05:31 PM, Anthony Liguori wrote:

Several alpha system chips MCE when accessed with incorrect sizes.
E.g. only 64-bit accesses are allowed.



But is this a characteristic of devices or is this a characteristic of
the chipset/CPU?


The chipset is modelled by a MemoryRegion too.



At any rate, I'm fairly sure it doesn't belong in the MemoryRegion
structure.



Since it isn't a global property, where does it belong?


The chipset should have an intercept in the dispatch path that enforces 
this (this assumes hierarchical dispatch).


Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Anthony Liguori

On 05/22/2011 01:39 AM, Avi Kivity wrote:

On 05/20/2011 05:46 PM, Anthony Liguori wrote:

On 05/20/2011 09:40 AM, Richard Henderson wrote:

On 05/20/2011 07:31 AM, Anthony Liguori wrote:

But is this a characteristic of devices or is this a characteristic
of the chipset/CPU?


Chipset.


So if the chipset only allows accesses that are 64-bit, then you'll
want to have hierarchical dispatch filter non 64-bit accesses and
raise an MCE appropriately.

So you don't need anything in MemoryRegion, you need code in the
dispatch path.


MemoryRegion *is* the dispatch path. Only done declaratively so we can
flatten it whenever it changes.


We don't want dispatch to be 100% declarative.  That's what will cause 
the API to get horrendously ugly.


An example is PCI-bus level endianness conversion.  I also believe the 
Sparc IOMMU has an xor engine.


You could add a 'bool swap_endian' and an 'uint32_t xor_mask' in 
MemoryRegion but now you're adding a ton of platform specific knowledge 
to what should be an independent layer.


Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/22/2011 06:46 PM, Anthony Liguori wrote:

MemoryRegion *is* the dispatch path. Only done declaratively so we can
flatten it whenever it changes.



We don't want dispatch to be 100% declarative.  That's what will cause 
the API to get horrendously ugly.


An example is PCI-bus level endianness conversion.  I also believe the 
Sparc IOMMU has an xor engine.


You could add a 'bool swap_endian' and an 'uint32_t xor_mask' in 
MemoryRegion but now you're adding a ton of platform specific 
knowledge to what should be an independent layer.




Currently containers do not use the read/write function pointers.  We 
could make them (or perhaps others) act as transformations on the data 
as it passes.  So it's still declarative (the entire flow is known at 
registration time) but it doesn't embed platform magic.


Byteswap is sufficiently generic to add as a region property, IMO.

btw, wrt iommu emulation, the API finally allows us to determine the 
path between any two devices, so we can apply the right iommu 
transformations.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-22 Thread Avi Kivity

On 05/22/2011 06:44 PM, Anthony Liguori wrote:




At any rate, I'm fairly sure it doesn't belong in the MemoryRegion
structure.



Since it isn't a global property, where does it belong?



The chipset should have an intercept in the dispatch path that 
enforces this (this assumes hierarchical dispatch).


So instead of region-ops-valid.*, region-ops-intercept()?

btw, that still doesn't require hierarchical dispatch.  If intercepts 
only check if the access is valid, it can still be flattened.


Hierarchical dispatch means that chipset callbacks get to choose which 
subregion callbacks are called, which isn't the case here.  If it were, 
it would be impossible to figure out the kvm slot layout.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio_net: fix patch: virtio_net: limit xmit polling

2011-05-22 Thread Michael S. Tsirkin
On Thu, May 19, 2011 at 05:00:17PM +0930, Rusty Russell wrote:
 On Thu, 19 May 2011 01:01:25 +0300, Michael S. Tsirkin m...@redhat.com 
 wrote:
  The patch  virtio_net: limit xmit polling
  got the logic reversed: it polled while we had
  capacity not while ring was empty.
  
  Fix it up and clean up a bit by using a for loop.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  OK, turns out that patch was borken. Here's
  a fix that survived stress test on my box.
  Pushed on my branch, I'll send a rebased series
  with Rusty's comments addressed ASAP.
 
 Normally you would have missed the merge window by now, but I'd really
 like this stuff in, so I'm holding it open for this.  I want these patches
 in linux-next for at least a few days before I push them.
 
 If you think we're not close enough, please tell me and I'll push
 the rest of the virtio patches to Linus now.  
 
 Thanks,
 Rusty.

I think it makes sense to push just the patches you have
applied by now (event index + delayed callback) - the
rest are close but they are guest only patches so very easy to
experiment with out of tree. OTOH if event index misses the
window it makes testing painful as we have to keep patching
both host and guest.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PCI passthrough and full virtualization on Debian Squeeze with AMD890 FX

2011-05-22 Thread Matthias Mann
Hi, this is my first try to get a working full virtualized KVM guest with one
real PCI device. I have the Asrock 890FX Deluxe3 with a Phenom II X4 945 which
both shold have IOMMU support. I did the steps described on
http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM but it
seems that full virtualization doesn't work. If i do 'kvm -enable-kvm -cpu ?' i
get no host CPU and PCI passthrough also fails.

Assigning the PCI device went fine and with lspci i can see the PCI device in
the guest but the device doesn't work. The command to run the guest was:

sudo kvm -M pc-0.12 -cpu kvm64,+svm -smp 1,cores=1 -m 1G -name Debian32
-vga std -net none -enable-kvm -device pci-assign,host=04:00.0,id=ethernet
-daemonize -drive file=/dev/mapper/vol-deb,if=scsi,media=disk,index=0,
snapshot=off,cache=writethrough,format=raw,boot=on -boot c
-chroot /home/kvmuser -runas kvmuser

dmesg | grep AMD-Vi
AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40
AMD-Vi: device isolation enabled
AMD-Vi: Lazy IO/TLB flushing enabled

Some morde output of dmesg:
pci-stub :04:00.0: PCI INT A - Link[LNKH] - GSI 11 (level, low) - IRQ 11
[ 3235.032088] pci-stub :04:00.0: restoring config space at offset 0x8 (was
 0xc, writing 0xcfff800c)
[ 3235.032105] pci-stub :04:00.0: restoring config space at offset 0x3 (was
 0x0, writing 0x10)
[ 3235.032116] pci-stub :04:00.0: restoring config space at offset 0x1 (was
 0x10, writing 0x100503)
[ 3235.500420] assign device: host bdf = 4:0:0
[ 3235.757750] kvm:3099 freeing invalid memtype cfff8000-cfff9000
[ 3302.746197] kvm:3099 freeing invalid memtype cfff9000-cfffc000
[ 3302.972095] pci-stub :04:00.0: restoring config space at offset 0x8 (was
 0xc, writing 0xcfff800c)
[ 3302.972112] pci-stub :04:00.0: restoring config space at offset 0x3 (was
 0x0, writing 0x10)
[ 3302.972124] pci-stub :04:00.0: restoring config space at offset 0x1 (was
 0x10, writing 0x180507)
[ 3302.972165] pci-stub :04:00.0: PCI INT A disabled

Version of KVM is 0.12.5+dfsg-5+squeeze1
Host is Debian Squeeze amd64

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/30] nVMX: Nested VMX, v9

2011-05-22 Thread Nadav Har'El
On Thu, May 12, 2011, Gleb Natapov wrote about Re: [PATCH 0/30] nVMX: Nested 
VMX, v9:
  But if my interpretation of the code is correct, SVM isn't much closer
  than VMX to the goal of moving this logic to x86.c. When some logic is
  moved there, both SVM and VMX code will need to change - perhaps even
  considerably. So how will it be helpful to make VMX behave exactly like
  SVM does now, when the latter will also need to change considerably?
  
 SVM design is much close to the goal of moving the logic into x86.c
 because IIRC it does not bypass parsing of IDT vectoring info into arch
 independent structure. VMX code uses vmx-idt_vectoring_info directly.

At the risk of sounding blasphemous, I'd like to make the case that perhaps
the current nested-VMX design - regarding the IDT-vectoring-info-field
handling - is actually closer than nested-SVM to the goal of moving clean
nested-supporting logic into x86.c, instead of having ad-hoc, unnatural,
workarounds.

Let me explain, and see if you agree with my logic:

We discover at exit time whether the virtualization hardware (VMX or SVM)
exited while *delivering* an interrupt or exception to the current guest.
This is known as idt-vectoring-information in VMX.

What do we need to do with this idt-vectoring-information? In regular (non-
nested) guests, the answer is simple: On the next entry, we need to inject
this event again into the guest, so it can resume the delivery of the
same event it was trying to deliver. This is why the nested-unaware code
has a vmx_complete_interrupts which basically adds this idt-vectoring-info
into KVM's event queue, which on the next entry will be injected similarly
to the way virtual interrupts from userspace are injected, and so on.

But with nested virtualization, this is *not* what is supposed to happen -
we do not *always* need to inject the event to the guest. We will only need
to inject the event if the next entry will be again to the same guest, i.e.,
L1 after L1, or L2 after L2. If the idt-vectoring-info came from L2, but
our next entry will be into L1 (i.e., a nested exit), we *shouldn't* inject
the event as usual, but should rather pass this idt-vectoring-info field
as the exit information that L1 gets (in nested vmx terminology, in vmcs12).

However, at the time of exit, we cannot know for sure whether L2 will actually
run next, because it is still possible that an injection from user space,
before the next entry, will cause us to decide to exit to L1.

Therefore, I believe that the clean solution isn't to leave the original
non-nested logic that always queues the idt-vectoring-info assuming it will
be injected, and then if it shouldn't (because we want to exit during entry)
we need to skip the entry once as a trick to avoid this wrong injection.

Rather, a clean solution is, I think, to recognize that in nested
virtualization, idt-vectoring-info is a different kind of beast than regular
injected events, and it needs to be saved at exit time in a different field
(which will of course be common to SVM and VMX). Only at entry time, after
the regular injection code (which may cause a nested exit), we can call a
x86_op to handle this special injection.

The benefit of this approach, which is closer to the current vmx code,
is, I think, that x86.c will contain clear, self-explanatory nested logic,
instead of relying on vmx.c or svm.c circumventing various x86.c functions
and mechanisms to do something different from what they were meant to do.

What do you think?


-- 
Nadav Har'El|   Sunday, May 22 2011, 19 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |If I were two-faced, would I be wearing
http://nadav.harel.org.il   |this one? Abraham Lincoln
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-22 Thread Rusty Russell
On Sun, 22 May 2011 15:10:08 +0300, Michael S. Tsirkin m...@redhat.com 
wrote:
 On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote:
  On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com 
  wrote:
   Current code might introduce a lot of latency variation
   if there are many pending bufs at the time we
   attempt to transmit a new one. This is bad for
   real-time applications and can't be good for TCP either.
  
  Do we have more than speculation to back that up, BTW?
 
 Need to dig this up: I thought we saw some reports of this on the list?

I think so too, but a reference needs to be here too.

It helps to have exact benchmarks on what's being tested, otherwise we
risk unexpected interaction with the other optimization patches.

 struct sk_buff *skb;
 unsigned int len;
   -
   - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) {
   + bool c;
   + int n;
   +
   + /* We try to free up at least 2 skbs per one sent, so that we'll get
   +  * all of the memory back if they are used fast enough. */
   + for (n = 0;
   +  ((c = virtqueue_get_capacity(vi-svq)  capacity) || n  2) 
   +  ((skb = virtqueue_get_buf(vi-svq, len)));
   +  ++n) {
 pr_debug(Sent skb %p\n, skb);
 vi-dev-stats.tx_bytes += skb-len;
 vi-dev-stats.tx_packets++;
 dev_kfree_skb_any(skb);
 }
   + return !c;
  
  This is for() abuse :)
  
  Why is the capacity check in there at all?  Surely it's simpler to try
  to free 2 skbs each time around?
 
 This is in case we can't use indirect: we want to free up
 enough buffers for the following add_buf to succeed.

Sure, or we could just count the frags of the skb we're taking out,
which would be accurate for both cases and far more intuitive.

ie. always try to free up twice as much as we're about to put in.

Can we hit problems with OOM?  Sure, but no worse than now...

The problem is that this virtqueue_get_capacity() returns the worst
case, not the normal case.  So using it is deceptive.

 I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
 sure we have enough space in the buffer. Another way to do
 that is with a define :).

To do this properly, we should really be using the actual number of sg
elements needed, but we'd have to do most of xmit_skb beforehand so we
know how many.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM Test: Switch current working folder in unattended_install.py.

2011-05-22 Thread Feng Yang

On 05/21/2011 02:59 AM, Lucas Meneghel Rodrigues wrote:

On Thu, 2011-05-19 at 18:24 +0800, fy...@redhat.com wrote:

From: Feng Yangfy...@redhat.com

Current working folder for
 unattended_install_config = UnattendedInstallConfig(test, params)
 unattended_install_config.setup()
must be kvm folder.

This is not needed at all. What might be going on your setup is some
incorrectly set or absent path that might be messing up with relative
paths during your install.

Please provide some more info so we can fix your problem properly.
Meanwhile I'm marking this as 'rejected'.

Thanks!

Thanks for your comment.
After merge upstream code to our local tree, I found that our local 
unattended_install could not work.
Before,  Current working folder for unattended.py  is kvm folder, 
changed in process_command.

Now  Current working folder for

unattended_install_config = UnattendedInstallConfig(test, params)
unattended_install_config.setup()

changed to case result folder. So our unattended_install always fails at 
could not find ks.iso.

Then I send this patch.

I will recheck our local code and configure.
Thanks very much!

Feng Yang

Signed-off-by: Feng Yangfy...@redhat.com
---
  client/tests/kvm/tests/unattended_install.py |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/unattended_install.py 
b/client/tests/kvm/tests/unattended_install.py
index 50a8c7a..eee1761 100644
--- a/client/tests/kvm/tests/unattended_install.py
+++ b/client/tests/kvm/tests/unattended_install.py
@@ -506,8 +506,11 @@ def run_unattended_install(test, params, env):
  @param params: Dictionary with the test parameters.
  @param env: Dictionary with test environment.
  
+cur_folder = os.getcwd()
+os.chdir(test.bindir)
  unattended_install_config = UnattendedInstallConfig(test, params)
  unattended_install_config.setup()
+os.chdir(cur_folder)
  vm = env.get_vm(params[main_vm])
  vm.create()



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html