Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 05:31 PM, Anthony Liguori wrote: Several alpha system chips MCE when accessed with incorrect sizes. E.g. only 64-bit accesses are allowed. But is this a characteristic of devices or is this a characteristic of the chipset/CPU? The chipset is modelled by a MemoryRegion too. At any rate, I'm fairly sure it doesn't belong in the MemoryRegion structure. Since it isn't a global property, where does it belong? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 05:46 PM, Anthony Liguori wrote: On 05/20/2011 09:40 AM, Richard Henderson wrote: On 05/20/2011 07:31 AM, Anthony Liguori wrote: But is this a characteristic of devices or is this a characteristic of the chipset/CPU? Chipset. So if the chipset only allows accesses that are 64-bit, then you'll want to have hierarchical dispatch filter non 64-bit accesses and raise an MCE appropriately. So you don't need anything in MemoryRegion, you need code in the dispatch path. MemoryRegion *is* the dispatch path. Only done declaratively so we can flatten it whenever it changes. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 09:16 PM, Blue Swirl wrote: On Fri, May 20, 2011 at 5:46 PM, Anthony Liguorianth...@codemonkey.ws wrote: On 05/20/2011 09:40 AM, Richard Henderson wrote: On 05/20/2011 07:31 AM, Anthony Liguori wrote: But is this a characteristic of devices or is this a characteristic of the chipset/CPU? Chipset. So if the chipset only allows accesses that are 64-bit, then you'll want to have hierarchical dispatch filter non 64-bit accesses and raise an MCE appropriately. So you don't need anything in MemoryRegion, you need code in the dispatch path. Sparc (32/64) systems are also very picky about wrong sized accesses. I think the buses have lines for access size and the device can (and they will) signal an error in these cases. Then the bus controller will raise an NMI. I think the easiest way to handle this could be to use overlapping registrations for specific sizes. Then we could make a default error generator device, which would just signal NMI/MCE on any access. It would register for all of the picky area with lowest possible priority. Other devices would register the small working access areas with no knowledge about this error generator device. Any correct access should go to other devices, bad accesses to the error generator. Though this would not be very different from current unassigned access handling. The MemoryRegion way of doing it would be to register the MemoryRegion of the bus with the picky attributes. Any sub-regions will inherit the property. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VM terminates when doing a live migration
Hi, Michael. On Saturday, 21 May 2011 23:39:27 -0400, Michael Stroucken wrote: I'm doing some testing with KVM Live Migration. SS01 (VMHost) has Debian GNU/Linux 6.0.1 and Defiant (VMHost) has Debian GNU/Linux 5.0.8. Defiant has Linux 2.6.32-15~bpo50+1 and 0.12.5+dfsg-3~bpo50+2, and SS01 has Linux 2.6.32-31 and 0.12.5+dfsg-5+squeeze1. Both instalation are 32-bit, but the kernel in SS01 is amd64 and the kernel in Defiant is 686. Migration from Defiant to SS01 is completed successfully, but when trying to migrate a VM from SS01 to Defiant, the VM terminates (segfault?). This is due to the difference in the kernel of each VMHost? Can you try saving the vms state to a file on ss01 and restoring it on defiant? I tried following your suggestion but same thing happens: ss01:~# kvm -m 256 -boot d -net nic,vlan=0,macaddr=52:54:67:92:9d:63 \ -net tap -daemonize -vnc :15 -k es -localtime -cdrom \ /mnt/systemrescuecd-x86-2.0.1.iso -monitor telnet:localhost:4055,server,nowait ss01:~# telnet localhost 4055 Trying ::1... Connected to localhost. Escape character is '^]'. QEMU 0.12.5 monitor - type 'help' for more information (qemu) stop (qemu) migrate_set_speed 4095m (qemu) migrate exec:gzip -c STATEFILE.gz Connection closed by foreign host. ss01:~# ps ax|grep systemrescuecd 26564 pts/0S+ 0:00 grep systemrescuecd It appears that the 'migrate' command causes termination of the virtual machine to these parameters as well. Do migrations between ss01 and another amd64 box work? I have not tried a migration where both source and destination have the same characteristics (i386 installation and amd64 kernel). Thanks for your reply. Regards, Daniel -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Lenny - Linux user #188.598 signature.asc Description: Digital signature
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 08:59 PM, Blue Swirl wrote: On Thu, May 19, 2011 at 5:12 PM, Avi Kivitya...@redhat.com wrote: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client); +/* Enable memory coalescing for the region. MMIO -write callbacks may be + * delayed until a non-coalesced MMIO is issued. + */ +void memory_region_set_coalescing(MemoryRegion *mr); +/* Enable memory coalescing for a sub-range of the region. MMIO -write + * callbacks may be delayed until a non-coalesced MMIO is issued. + */ +void memory_region_add_coalescing(MemoryRegion *mr, + target_phys_addr_t offset, + target_phys_addr_t size); +/* Disable MMIO coalescing for the region. */ +void memory_region_clear_coalescing(MemoryRegion *mr); Perhaps the interface could be more generic, like +void memory_region_set_property(MemoryRegion *mr, unsigned flags); +void memory_region_clear_property(MemoryRegion *mr, unsigned flags); Coalescing is a complex property, not just a boolean attribute. We probably will have a number of boolean attributes later, though. + * conflicts are resolved by having a higher @priority hide a lower @priority. + * Subregions without priority are taken as @priority 0. + */ +void memory_region_add_subregion_overlap(MemoryRegion *mr, + target_phys_addr_t offset, + MemoryRegion *subregion, + unsigned priority); +/* Remove a subregion. */ +void memory_region_del_subregion(MemoryRegion *mr, + MemoryRegion *subregion); What would the subregions be used for? Subregions describe the flow of data through the memory bus. We'd have a subregion for the PCI bus, with its own subregions for various BARs, with some having subregions for dispatching different MMIO types within the BAR. This allows, for example, the PCI layer to move a BAR without the PCI device knowing anything about it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Enable CPU SMEP feature for KVM
On 05/22/2011 08:23 AM, Yang, Wei Y wrote: This patch matches with [PATCH v2] Enable CPU SMEP feature support for QEMU-KVM, no changes since v1. Enable newly documented SMEP (Supervisor Mode Execution Protection) CPU feature in KVM module. Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP prevents kernel from executing code in application. Updated Intel SDM describes this CPU feature. The document will be published soon. This patch is based on Fenghua's SMEP patch series, as referred by: https://lkml.org/lkml/2011/5/17/523 This patch enables guests' usage of SMEP. Currently, we don't enable this feature for guests with shadow page tables. Why not? I see nothing that conflicts with shadow. Missing: update kvm_set_cr4() to reject SMEP if it's disabled in cpuid drop SMEP from cr4_guest_owned_bits if SMEP is disabled in cpuid update walk_addr_generic() to fault if SMEP is enabled and fetching from a user page -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 05:06 PM, Richard Henderson wrote: Is this structure honestly any better than 4 function pointers? I can't see that it is, myself. That was requested by Anthony. And in fact we have two bits of information per access size, one is whether the access is allowed or not, the other is whether we want to use a larger or smaller access size function (useful for auto-splitting a 64-bit access into two 32-bit accesses for example). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 08:59 PM, Blue Swirl wrote: On Thu, May 19, 2011 at 5:12 PM, Avi Kivitya...@redhat.com wrote: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. +/* Destroy a memory region. The memory becomes inaccessible. */ +void memory_region_destroy(MemoryRegion *mr); Doesn't the lower priority region become accessible instead in some cases? If we require that _add_subregion() and _del_subregion() be paired, then no. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2
Hi, On Sun, May 22, 2011, Tian, Kevin wrote about RE: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2: Here the vmcs02 being overridden may have been run on another processor before but is not vmclear-ed yet. When you resume this vmcs02 with new content on a separate processor, the risk of corruption exists. I still believe that my current code is correct (in this area). I'll try to explain it here and would be grateful if you could point to me the error (if there is one) in my logic: Nested_vmx_run() is our function which is switches from running L1 to L2 (patch 18). This function starts by calling nested_get_current_vmcs02(), which gets us *some* vmcs to use for vmcs02. This may be a fresh new VMCS, or a recycled VMCS, some VMCS we've previously used to run some, potentially different L2 guest on some, potentially different, CPU. nested_get_current_vmcs02() returns a saved_vmcs structure, which not only contains a VMCS, but also remembers on which (if any) cpu it is currently loaded (and whether it was VMLAUNCHed once on that cpu). The next thing that Nested_vmx_run() now does is to set up in the vcpu object the vmcs, cpu and launched fields according to what was returned above. Now it calls vmx_vcpu_load(). This standard KVM function checks if we're now running on a different CPU from the vcpu-cpu, and if it a different one, is uses vcpu_clear() to VMCLEAR the vmcs on the CPU where it was last loaded (using an IPI). Only after it vmclears the VMCS on the old CPU, it can finally load the VMCS on the new CPU. Only now Nested_vmx_run() can call prepare_vmcs02, which starts VMWRITEing to this VMCS, and finally returns. P.S. Seeing that you're from Intel, maybe you can help me with a pointer: I found what appears to be a small error in the SDM - who can I report it to? Thanks, Nadav. -- Nadav Har'El| Sunday, May 22 2011, 18 Iyyar 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I work for money. If you want loyalty, http://nadav.harel.org.il |buy yourself a dog. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] Enable CPU SMEP feature for KVM
This patch matches with [PATCH v2] Enable CPU SMEP feature support for QEMU-KVM, no changes since v1. Enable newly documented SMEP (Supervisor Mode Execution Protection) CPU feature in KVM module. Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP prevents kernel from executing code in application. Updated Intel SDM describes this CPU feature. The document will be published soon. This patch is based on Fenghua's SMEP patch series, as referred by: https://lkml.org/lkml/2011/5/17/523 This patch enables guests' usage of SMEP. Currently, we don't enable this feature for guests with shadow page tables. Why not? I see nothing that conflicts with shadow. We don't need to enable it for shadow page table, because shadow has mask against guest/shadow PTE, which may cause problem. Let's keep shadow as it is because it's already very complex. Assume SMEP machines should have EPT. Missing: update kvm_set_cr4() to reject SMEP if it's disabled in cupid Yes, I will check it. drop SMEP from cr4_guest_owned_bits if SMEP is disabled in cupid SMEP BIT is not included in KVM_CR4_GUEST_OWNED_BITS. update walk_addr_generic() to fault if SMEP is enabled and fetching Comments above. from a user page -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Enable CPU SMEP feature for KVM
On 05/22/2011 11:08 AM, Yang, Wei Y wrote: This patch matches with [PATCH v2] Enable CPU SMEP feature support for QEMU-KVM, no changes since v1. Enable newly documented SMEP (Supervisor Mode Execution Protection) CPU feature in KVM module. Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP prevents kernel from executing code in application. Updated Intel SDM describes this CPU feature. The document will be published soon. This patch is based on Fenghua's SMEP patch series, as referred by: https://lkml.org/lkml/2011/5/17/523 This patch enables guests' usage of SMEP. Currently, we don't enable this feature for guests with shadow page tables. Why not? I see nothing that conflicts with shadow. We don't need to enable it for shadow page table, because shadow has mask against guest/shadow PTE, which may cause problem. Let's keep shadow as it is because it's already very complex. Assume SMEP machines should have EPT. I don't understand why. Can you elaborate? Shadow implements the U bit, which is all that is needed by SMEP as far as I can tell. update walk_addr_generic() to fault if SMEP is enabled and fetching Comments above. from a user page Needs to be done even from EPT, in case walk_addr_generic() is invoked by the emulator. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2
Hi, On Fri, May 20, 2011, Tian, Kevin wrote about RE: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2: Possibly we can maintain the vmcs02 pool along with L1 VMCLEAR ops, which is similar to the hardware behavior regarding to cleared and launched state. If you set VMCS02_POOL_SIZE to a large size, and L1, like typical hypervisors, only keeps around a few VMCSs (and VMCLEARs the ones it will not use again), then we'll only have a few vmcs02: handle_vmclear() removes from the pool the vmcs02 that L1 explicitly told us it won't need again. +struct saved_vmcs { + struct vmcs *vmcs; + int cpu; + int launched; +}; saved looks a bit misleading here. It's simply a list of all active vmcs02 tracked by kvm, isn't it? I have rewritten this part of the code, based on Avi's and Marcelo's requests, and the new name for this structure is loaded_vmcs, i.e., a structure describing where a VMCS was loaded. +/* Used to remember the last vmcs02 used for some recently used vmcs12s */ +struct vmcs02_list { + struct list_head list; + gpa_t vmcs12_addr; uniform the name 'vmptr' as nested_vmx strucure: Ok. Changing all the mentions of vmcs12_addr to vmptr. -- Nadav Har'El| Sunday, May 22 2011, 18 Iyyar 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |A witty saying proves nothing. -- http://nadav.harel.org.il |Voltaire -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
On Wed, May 18, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling: Humpf, right. OK, you can handle the x86.c usage with diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c ... Hi Avi and Marcelo, here is a the new first patch to the nvmx patch set, which overhauls the handling of vmcss on cpus, as you asked. As you guessed, the nested entry and exit code becomes much simpler and cleaner, with the whole VMCS switching code on entry, for example, reduced to: cpu = get_cpu(); vmx-loaded_vmcs = vmcs02; vmx_vcpu_put(vcpu); vmx_vcpu_load(vcpu, cpu); vcpu-cpu = cpu; put_cpu(); You can apply this patch separately from the rest of the patch set, if you wish. I'm sending just this one, like you asked - and can send the rest of the patches when you ask me to. Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus. In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it because (at least in theory) the processor might not have written all of its content back to memory. Since a patch from June 26, 2008, this is done using a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU. The problem is that with nested VMX, we no longer have the concept of a vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for L2s), and each of those may be have been last loaded on a different cpu. So instead of linking the vcpus, we link the VMCSs, using a new structure loaded_vmcs. This structure contains the VMCS, and the information pertaining to its loading on a specific cpu (namely, the cpu number, and whether it was already launched on this cpu once). In nested we will also use the same structure to hold L2 VMCSs, and vmx-loaded_vmcs is a pointer to the currently active VMCS. Signed-off-by: Nadav Har'El n...@il.ibm.com --- arch/x86/kvm/vmx.c | 129 ++- arch/x86/kvm/x86.c |3 - 2 files changed, 80 insertions(+), 52 deletions(-) --- .before/arch/x86/kvm/x86.c 2011-05-22 11:41:57.0 +0300 +++ .after/arch/x86/kvm/x86.c 2011-05-22 11:41:57.0 +0300 @@ -2119,7 +2119,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu if (need_emulate_wbinvd(vcpu)) { if (kvm_x86_ops-has_wbinvd_exit()) cpumask_set_cpu(cpu, vcpu-arch.wbinvd_dirty_mask); - else if (vcpu-cpu != -1 vcpu-cpu != cpu) + else if (vcpu-cpu != -1 vcpu-cpu != cpu +cpu_online(vcpu-cpu)) smp_call_function_single(vcpu-cpu, wbinvd_ipi, NULL, 1); } --- .before/arch/x86/kvm/vmx.c 2011-05-22 11:41:57.0 +0300 +++ .after/arch/x86/kvm/vmx.c 2011-05-22 11:41:58.0 +0300 @@ -116,6 +116,18 @@ struct vmcs { char data[0]; }; +/* + * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also + * remember whether it was VMLAUNCHed, and maintain a linked list of all VMCSs + * loaded on this CPU (so we can clear them if the CPU goes down). + */ +struct loaded_vmcs { + struct vmcs *vmcs; + int cpu; + int launched; + struct list_head loaded_vmcss_on_cpu_link; +}; + struct shared_msr_entry { unsigned index; u64 data; @@ -124,9 +136,7 @@ struct shared_msr_entry { struct vcpu_vmx { struct kvm_vcpu vcpu; - struct list_head local_vcpus_link; unsigned long host_rsp; - int launched; u8fail; u8cpl; bool nmi_known_unmasked; @@ -140,7 +150,14 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif - struct vmcs *vmcs; + /* +* loaded_vmcs points to the VMCS currently used in this vcpu. For a +* non-nested (L1) guest, it always points to vmcs01. For a nested +* guest (L2), it points to a different VMCS. +*/ + struct loaded_vmcsvmcs01; + struct loaded_vmcs *loaded_vmcs; + bool __launched; /* temporary, used in vmx_vcpu_run */ struct msr_autoload { unsigned nr; struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS]; @@ -200,7 +217,11 @@ static int vmx_set_tss_addr(struct kvm * static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); -static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu); +/* + * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed + * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it. + */ +static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu); static DEFINE_PER_CPU(struct desc_ptr, host_gdt); static unsigned long *vmx_io_bitmap_a; @@ -514,25 +535,25 @@ static void
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On Sun, May 22, 2011 at 9:45 AM, Avi Kivity a...@redhat.com wrote: On 05/20/2011 08:59 PM, Blue Swirl wrote: On Thu, May 19, 2011 at 5:12 PM, Avi Kivitya...@redhat.com wrote: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client); +/* Enable memory coalescing for the region. MMIO -write callbacks may be + * delayed until a non-coalesced MMIO is issued. + */ +void memory_region_set_coalescing(MemoryRegion *mr); +/* Enable memory coalescing for a sub-range of the region. MMIO -write + * callbacks may be delayed until a non-coalesced MMIO is issued. + */ +void memory_region_add_coalescing(MemoryRegion *mr, + target_phys_addr_t offset, + target_phys_addr_t size); +/* Disable MMIO coalescing for the region. */ +void memory_region_clear_coalescing(MemoryRegion *mr); Perhaps the interface could be more generic, like +void memory_region_set_property(MemoryRegion *mr, unsigned flags); +void memory_region_clear_property(MemoryRegion *mr, unsigned flags); Coalescing is a complex property, not just a boolean attribute. We probably will have a number of boolean attributes later, though. But what is the difference between adding coalescing to an area and setting the bit property 'coalescing' to an area? At least what you propose now is not so complex that it couldn't be handled as a single bit. + * conflicts are resolved by having a higher @priority hide a lower @priority. + * Subregions without priority are taken as @priority 0. + */ +void memory_region_add_subregion_overlap(MemoryRegion *mr, + target_phys_addr_t offset, + MemoryRegion *subregion, + unsigned priority); +/* Remove a subregion. */ +void memory_region_del_subregion(MemoryRegion *mr, + MemoryRegion *subregion); What would the subregions be used for? Subregions describe the flow of data through the memory bus. We'd have a subregion for the PCI bus, with its own subregions for various BARs, with some having subregions for dispatching different MMIO types within the BAR. This allows, for example, the PCI layer to move a BAR without the PCI device knowing anything about it. But why can't a first class region be used for that? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/10] qemu-kvm: Use upstream kvm_cpu_exec
On 2011-05-20 19:17, Christoph Hellwig wrote: On Fri, May 20, 2011 at 07:12:39PM +0200, Jan Kiszka wrote: Upstream's and qemu-kvm's kvm_cpu_exec are not logically equivalent so s/not/now/? Oops, of course. If there is no other need to repost, this should be fixed on merge. Thanks, Jan signature.asc Description: OpenPGP digital signature
Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving
* tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote: diff --git a/tools/perf/feature-tests.mak b/tools/kvm/config/feature-tests.mak similarity index 83% copy from tools/perf/feature-tests.mak copy to tools/kvm/config/feature-tests.mak Btw, now that we have feature-tests.mak it would be nice to populate the checks for the various assumptions. One i recently ran into on a new box where i tried to install tools/kvm was: In file included from /usr/include/features.h:387:0, from /usr/include/unistd.h:26, from include/kvm/util.h:12, from bios/e820.c:5: /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory compilation terminated. make: *** [bios/bios-rom.bin] Error 1 that's a dependency on glibc-dev[el]. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving
* Ingo Molnar mi...@elte.hu wrote: * tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote: diff --git a/tools/perf/feature-tests.mak b/tools/kvm/config/feature-tests.mak similarity index 83% copy from tools/perf/feature-tests.mak copy to tools/kvm/config/feature-tests.mak Btw, now that we have feature-tests.mak it would be nice to populate the checks for the various assumptions. One i recently ran into on a new box where i tried to install tools/kvm was: In file included from /usr/include/features.h:387:0, from /usr/include/unistd.h:26, from include/kvm/util.h:12, from bios/e820.c:5: /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory compilation terminated. make: *** [bios/bios-rom.bin] Error 1 that's a dependency on glibc-dev[el]. Another detail: on 64-bit hosts the dependency is on gibc-dev[el].i686, i.e. the 32-bit package. Would it be simple to remove this dependency? It's not typically installed by default on distros and it would be nice to make most of the kvm code build by default almost everywhere. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving
On 05/22/2011 02:58 PM, Ingo Molnar wrote: * tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote: diff --git a/tools/perf/feature-tests.mak b/tools/kvm/config/feature-tests.mak similarity index 83% copy from tools/perf/feature-tests.mak copy to tools/kvm/config/feature-tests.mak Btw, now that we have feature-tests.mak it would be nice to populate the checks for the various assumptions. One i recently ran into on a new box where i tried to install tools/kvm was: In file included from /usr/include/features.h:387:0, from /usr/include/unistd.h:26, from include/kvm/util.h:12, from bios/e820.c:5: /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory compilation terminated. make: *** [bios/bios-rom.bin] Error 1 that's a dependency on glibc-dev[el]. Thanks, Ingo Ouch. I've been hitting this lack of gnu/stubs-32.h on Fedora 15 too and eventually I had to install i686 packages for devel. I'll try to resolve this issue tonight, I suppose it's not absolutely urgent, right? -- Cyrill -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [tip:tools/kvm] kvm tools: Add conditional compilation of symbol resolving
On 05/22/2011 03:00 PM, Ingo Molnar wrote: * Ingo Molnar mi...@elte.hu wrote: * tip-bot for Cyrill Gorcunov gorcu...@gmail.com wrote: diff --git a/tools/perf/feature-tests.mak b/tools/kvm/config/feature-tests.mak similarity index 83% copy from tools/perf/feature-tests.mak copy to tools/kvm/config/feature-tests.mak Btw, now that we have feature-tests.mak it would be nice to populate the checks for the various assumptions. One i recently ran into on a new box where i tried to install tools/kvm was: In file included from /usr/include/features.h:387:0, from /usr/include/unistd.h:26, from include/kvm/util.h:12, from bios/e820.c:5: /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory compilation terminated. make: *** [bios/bios-rom.bin] Error 1 that's a dependency on glibc-dev[el]. Another detail: on 64-bit hosts the dependency is on gibc-dev[el].i686, i.e. the 32-bit package. Would it be simple to remove this dependency? It's not typically installed by default on distros and it would be nice to make most of the kvm code build by default almost everywhere. Thanks, Ingo I'll take a look if we really need it (note we need to compile 16bit code for bios blob so it might eventually be a problem ;) -- Cyrill -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/22/2011 12:32 PM, Blue Swirl wrote: +void memory_region_add_coalescing(MemoryRegion *mr, + target_phys_addr_t offset, + target_phys_addr_t size); +/* Disable MMIO coalescing for the region. */ +void memory_region_clear_coalescing(MemoryRegion *mr); Perhaps the interface could be more generic, like +void memory_region_set_property(MemoryRegion *mr, unsigned flags); +void memory_region_clear_property(MemoryRegion *mr, unsigned flags); Coalescing is a complex property, not just a boolean attribute. We probably will have a number of boolean attributes later, though. But what is the difference between adding coalescing to an area and setting the bit property 'coalescing' to an area? At least what you propose now is not so complex that it couldn't be handled as a single bit. Look at the API - add_coalescing() sets the coalescing property on a subrange of the memory region, not the entire region. (motivation - hw/e1000.c). + * conflicts are resolved by having a higher @priority hide a lower @priority. + * Subregions without priority are taken as @priority 0. + */ +void memory_region_add_subregion_overlap(MemoryRegion *mr, + target_phys_addr_t offset, + MemoryRegion *subregion, + unsigned priority); +/* Remove a subregion. */ +void memory_region_del_subregion(MemoryRegion *mr, + MemoryRegion *subregion); What would the subregions be used for? Subregions describe the flow of data through the memory bus. We'd have a subregion for the PCI bus, with its own subregions for various BARs, with some having subregions for dispatching different MMIO types within the BAR. This allows, for example, the PCI layer to move a BAR without the PCI device knowing anything about it. But why can't a first class region be used for that? Subregions are first-class regions. In fact all regions are subregions except the root. It's a tree of regions, each level adding an offset, clipping, and perhaps other attributes, with the leaves providing actual memory (mmio or RAM). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On Sun, May 22, 2011 at 2:36 PM, Avi Kivity a...@redhat.com wrote: On 05/22/2011 12:32 PM, Blue Swirl wrote: +void memory_region_add_coalescing(MemoryRegion *mr, + target_phys_addr_t offset, + target_phys_addr_t size); +/* Disable MMIO coalescing for the region. */ +void memory_region_clear_coalescing(MemoryRegion *mr); Perhaps the interface could be more generic, like +void memory_region_set_property(MemoryRegion *mr, unsigned flags); +void memory_region_clear_property(MemoryRegion *mr, unsigned flags); Coalescing is a complex property, not just a boolean attribute. We probably will have a number of boolean attributes later, though. But what is the difference between adding coalescing to an area and setting the bit property 'coalescing' to an area? At least what you propose now is not so complex that it couldn't be handled as a single bit. Look at the API - add_coalescing() sets the coalescing property on a subrange of the memory region, not the entire region. Right, but doesn't the same apply to any other properties, they may apply to a full range or just a subrange? (motivation - hw/e1000.c). + * conflicts are resolved by having a higher @priority hide a lower @priority. + * Subregions without priority are taken as @priority 0. + */ +void memory_region_add_subregion_overlap(MemoryRegion *mr, + target_phys_addr_t offset, + MemoryRegion *subregion, + unsigned priority); +/* Remove a subregion. */ +void memory_region_del_subregion(MemoryRegion *mr, + MemoryRegion *subregion); What would the subregions be used for? Subregions describe the flow of data through the memory bus. We'd have a subregion for the PCI bus, with its own subregions for various BARs, with some having subregions for dispatching different MMIO types within the BAR. This allows, for example, the PCI layer to move a BAR without the PCI device knowing anything about it. But why can't a first class region be used for that? Subregions are first-class regions. In fact all regions are subregions except the root. Oh, I see now. Maybe the comments should describe this. Or perhaps the terms should be something like 'bus/bridge/root' and 'region' instead of 'region' and 'subregion'? It's a tree of regions, each level adding an offset, clipping, and perhaps other attributes, with the leaves providing actual memory (mmio or RAM). I thought that there are two classes of regions, like PCI device vs. a single BAR. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com wrote: Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Do we have more than speculation to back that up, BTW? Need to dig this up: I thought we saw some reports of this on the list? This patch is pretty sloppy; the previous ones were better polished. -static void free_old_xmit_skbs(struct virtnet_info *vi) +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity) { A comment here indicating it returns true if it frees something? Agree. struct sk_buff *skb; unsigned int len; - - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) { + bool c; + int n; + + /* We try to free up at least 2 skbs per one sent, so that we'll get +* all of the memory back if they are used fast enough. */ + for (n = 0; +((c = virtqueue_get_capacity(vi-svq) capacity) || n 2) +((skb = virtqueue_get_buf(vi-svq, len))); +++n) { pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } + return !c; This is for() abuse :) Why is the capacity check in there at all? Surely it's simpler to try to free 2 skbs each time around? This is in case we can't use indirect: we want to free up enough buffers for the following add_buf to succeed. for (n = 0; n 2; n++) { skb = virtqueue_get_buf(vi-svq, len); if (!skb) break; pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int capacity; - /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(vi); + /* Free enough pending old buffers to enable queueing new ones. */ + free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS); /* Try to transmit */ capacity = xmit_skb(vi, skb); @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) netif_stop_queue(dev); if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) { /* More just got used, free them then recheck. */ - free_old_xmit_skbs(vi); - capacity = virtqueue_get_capacity(vi-svq); - if (capacity = 2+MAX_SKB_FRAGS) { + if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { This extra argument to free_old_xmit_skbs seems odd, unless you have future plans? Thanks, Rusty. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/22/2011 03:06 PM, Blue Swirl wrote: On Sun, May 22, 2011 at 2:36 PM, Avi Kivitya...@redhat.com wrote: On 05/22/2011 12:32 PM, Blue Swirl wrote: +void memory_region_add_coalescing(MemoryRegion *mr, + target_phys_addr_t offset, + target_phys_addr_t size); +/* Disable MMIO coalescing for the region. */ +void memory_region_clear_coalescing(MemoryRegion *mr); Perhaps the interface could be more generic, like +void memory_region_set_property(MemoryRegion *mr, unsigned flags); +void memory_region_clear_property(MemoryRegion *mr, unsigned flags); Coalescing is a complex property, not just a boolean attribute. We probably will have a number of boolean attributes later, though. But what is the difference between adding coalescing to an area and setting the bit property 'coalescing' to an area? At least what you propose now is not so complex that it couldn't be handled as a single bit. Look at the API - add_coalescing() sets the coalescing property on a subrange of the memory region, not the entire region. Right, but doesn't the same apply to any other properties, they may apply to a full range or just a subrange? We'll know when we have more properties. I expect most will be region-wide. Subregions are first-class regions. In fact all regions are subregions except the root. Oh, I see now. Maybe the comments should describe this. Or perhaps the terms should be something like 'bus/bridge/root' and 'region' instead of 'region' and 'subregion'? Problem is, memory_region_add_subregion() adds both sub-bridges and leaf regions. It's quite possible that BAR 0 will be a leaf region, and BAR 1 will be a sub-bridge. Can you suggest an alternative naming for the API? It's a tree of regions, each level adding an offset, clipping, and perhaps other attributes, with the leaves providing actual memory (mmio or RAM). I thought that there are two classes of regions, like PCI device vs. a single BAR. It's true in a way, except the mapping is not 1:1. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86 emulator: Fix unconditional return from get_descriptor_table_ptr()
On 05/21/2011 07:06 AM, Takuya Yoshikawa wrote: From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp A trivial typo was found in the following commit: commit 7753ed6043bfce55dc0c407490896632014b677e KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks When the table indicator flag is set, when the selector selects the current LDT, get_descriptor_table_ptr() returns without setting the size and address of the table. Guests will see #GP if this happens. Thanks, applied. Signed-off-by: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp --- Is this stable material? -- IIRC, someone reported a suspicous emulator bug recently. This was not yet merged upstream, so I folded this into the bad commit, and upstream will never see the bug. It's now 4bff1e86ad286d in kvm.git. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On Sun, May 22, 2011 at 3:18 PM, Avi Kivity a...@redhat.com wrote: On 05/22/2011 03:06 PM, Blue Swirl wrote: On Sun, May 22, 2011 at 2:36 PM, Avi Kivitya...@redhat.com wrote: On 05/22/2011 12:32 PM, Blue Swirl wrote: +void memory_region_add_coalescing(MemoryRegion *mr, + target_phys_addr_t offset, + target_phys_addr_t size); +/* Disable MMIO coalescing for the region. */ +void memory_region_clear_coalescing(MemoryRegion *mr); Perhaps the interface could be more generic, like +void memory_region_set_property(MemoryRegion *mr, unsigned flags); +void memory_region_clear_property(MemoryRegion *mr, unsigned flags); Coalescing is a complex property, not just a boolean attribute. We probably will have a number of boolean attributes later, though. But what is the difference between adding coalescing to an area and setting the bit property 'coalescing' to an area? At least what you propose now is not so complex that it couldn't be handled as a single bit. Look at the API - add_coalescing() sets the coalescing property on a subrange of the memory region, not the entire region. Right, but doesn't the same apply to any other properties, they may apply to a full range or just a subrange? We'll know when we have more properties. I expect most will be region-wide. Since we don't know about those yet, coalescing API could be like you propose. Later it can be changed to the property API, or leave for convenience. Subregions are first-class regions. In fact all regions are subregions except the root. Oh, I see now. Maybe the comments should describe this. Or perhaps the terms should be something like 'bus/bridge/root' and 'region' instead of 'region' and 'subregion'? Problem is, memory_region_add_subregion() adds both sub-bridges and leaf regions. It's quite possible that BAR 0 will be a leaf region, and BAR 1 will be a sub-bridge. Can you suggest an alternative naming for the API? How about memory_region_container_init() memory_region_add() -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] qemu-kvm-0.14.1
qemu-kvm-0.14.1 is now available. This release is based on the upstream qemu 0.14.1, plus kvm-specific enhancements. Please see the original qemu 0.14.1 release announcement for details. This release can be used with the kvm kernel modules provided by your distribution kernel, or by the modules in the kvm-kmod package, such as kvm-kmod-2.6.39. See [1] for details on choosing the right kvm modules. http://www.linux-kvm.org [1] http://www.linux-kvm.org/page/Getting_the_kvm_kernel_modules -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/22/2011 06:32 PM, Blue Swirl wrote: Can you suggest an alternative naming for the API? How about memory_region_container_init() memory_region_add() I'm neutral. If someone seconds this, I'll make it so. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/22/2011 01:38 AM, Avi Kivity wrote: On 05/20/2011 05:31 PM, Anthony Liguori wrote: Several alpha system chips MCE when accessed with incorrect sizes. E.g. only 64-bit accesses are allowed. But is this a characteristic of devices or is this a characteristic of the chipset/CPU? The chipset is modelled by a MemoryRegion too. At any rate, I'm fairly sure it doesn't belong in the MemoryRegion structure. Since it isn't a global property, where does it belong? The chipset should have an intercept in the dispatch path that enforces this (this assumes hierarchical dispatch). Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/22/2011 01:39 AM, Avi Kivity wrote: On 05/20/2011 05:46 PM, Anthony Liguori wrote: On 05/20/2011 09:40 AM, Richard Henderson wrote: On 05/20/2011 07:31 AM, Anthony Liguori wrote: But is this a characteristic of devices or is this a characteristic of the chipset/CPU? Chipset. So if the chipset only allows accesses that are 64-bit, then you'll want to have hierarchical dispatch filter non 64-bit accesses and raise an MCE appropriately. So you don't need anything in MemoryRegion, you need code in the dispatch path. MemoryRegion *is* the dispatch path. Only done declaratively so we can flatten it whenever it changes. We don't want dispatch to be 100% declarative. That's what will cause the API to get horrendously ugly. An example is PCI-bus level endianness conversion. I also believe the Sparc IOMMU has an xor engine. You could add a 'bool swap_endian' and an 'uint32_t xor_mask' in MemoryRegion but now you're adding a ton of platform specific knowledge to what should be an independent layer. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/22/2011 06:46 PM, Anthony Liguori wrote: MemoryRegion *is* the dispatch path. Only done declaratively so we can flatten it whenever it changes. We don't want dispatch to be 100% declarative. That's what will cause the API to get horrendously ugly. An example is PCI-bus level endianness conversion. I also believe the Sparc IOMMU has an xor engine. You could add a 'bool swap_endian' and an 'uint32_t xor_mask' in MemoryRegion but now you're adding a ton of platform specific knowledge to what should be an independent layer. Currently containers do not use the read/write function pointers. We could make them (or perhaps others) act as transformations on the data as it passes. So it's still declarative (the entire flow is known at registration time) but it doesn't embed platform magic. Byteswap is sufficiently generic to add as a region property, IMO. btw, wrt iommu emulation, the API finally allows us to determine the path between any two devices, so we can apply the right iommu transformations. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/22/2011 06:44 PM, Anthony Liguori wrote: At any rate, I'm fairly sure it doesn't belong in the MemoryRegion structure. Since it isn't a global property, where does it belong? The chipset should have an intercept in the dispatch path that enforces this (this assumes hierarchical dispatch). So instead of region-ops-valid.*, region-ops-intercept()? btw, that still doesn't require hierarchical dispatch. If intercepts only check if the access is valid, it can still be flattened. Hierarchical dispatch means that chipset callbacks get to choose which subregion callbacks are called, which isn't the case here. If it were, it would be impossible to figure out the kvm slot layout. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio_net: fix patch: virtio_net: limit xmit polling
On Thu, May 19, 2011 at 05:00:17PM +0930, Rusty Russell wrote: On Thu, 19 May 2011 01:01:25 +0300, Michael S. Tsirkin m...@redhat.com wrote: The patch virtio_net: limit xmit polling got the logic reversed: it polled while we had capacity not while ring was empty. Fix it up and clean up a bit by using a for loop. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- OK, turns out that patch was borken. Here's a fix that survived stress test on my box. Pushed on my branch, I'll send a rebased series with Rusty's comments addressed ASAP. Normally you would have missed the merge window by now, but I'd really like this stuff in, so I'm holding it open for this. I want these patches in linux-next for at least a few days before I push them. If you think we're not close enough, please tell me and I'll push the rest of the virtio patches to Linus now. Thanks, Rusty. I think it makes sense to push just the patches you have applied by now (event index + delayed callback) - the rest are close but they are guest only patches so very easy to experiment with out of tree. OTOH if event index misses the window it makes testing painful as we have to keep patching both host and guest. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PCI passthrough and full virtualization on Debian Squeeze with AMD890 FX
Hi, this is my first try to get a working full virtualized KVM guest with one real PCI device. I have the Asrock 890FX Deluxe3 with a Phenom II X4 945 which both shold have IOMMU support. I did the steps described on http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM but it seems that full virtualization doesn't work. If i do 'kvm -enable-kvm -cpu ?' i get no host CPU and PCI passthrough also fails. Assigning the PCI device went fine and with lspci i can see the PCI device in the guest but the device doesn't work. The command to run the guest was: sudo kvm -M pc-0.12 -cpu kvm64,+svm -smp 1,cores=1 -m 1G -name Debian32 -vga std -net none -enable-kvm -device pci-assign,host=04:00.0,id=ethernet -daemonize -drive file=/dev/mapper/vol-deb,if=scsi,media=disk,index=0, snapshot=off,cache=writethrough,format=raw,boot=on -boot c -chroot /home/kvmuser -runas kvmuser dmesg | grep AMD-Vi AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40 AMD-Vi: device isolation enabled AMD-Vi: Lazy IO/TLB flushing enabled Some morde output of dmesg: pci-stub :04:00.0: PCI INT A - Link[LNKH] - GSI 11 (level, low) - IRQ 11 [ 3235.032088] pci-stub :04:00.0: restoring config space at offset 0x8 (was 0xc, writing 0xcfff800c) [ 3235.032105] pci-stub :04:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10) [ 3235.032116] pci-stub :04:00.0: restoring config space at offset 0x1 (was 0x10, writing 0x100503) [ 3235.500420] assign device: host bdf = 4:0:0 [ 3235.757750] kvm:3099 freeing invalid memtype cfff8000-cfff9000 [ 3302.746197] kvm:3099 freeing invalid memtype cfff9000-cfffc000 [ 3302.972095] pci-stub :04:00.0: restoring config space at offset 0x8 (was 0xc, writing 0xcfff800c) [ 3302.972112] pci-stub :04:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10) [ 3302.972124] pci-stub :04:00.0: restoring config space at offset 0x1 (was 0x10, writing 0x180507) [ 3302.972165] pci-stub :04:00.0: PCI INT A disabled Version of KVM is 0.12.5+dfsg-5+squeeze1 Host is Debian Squeeze amd64 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Thu, May 12, 2011, Gleb Natapov wrote about Re: [PATCH 0/30] nVMX: Nested VMX, v9: But if my interpretation of the code is correct, SVM isn't much closer than VMX to the goal of moving this logic to x86.c. When some logic is moved there, both SVM and VMX code will need to change - perhaps even considerably. So how will it be helpful to make VMX behave exactly like SVM does now, when the latter will also need to change considerably? SVM design is much close to the goal of moving the logic into x86.c because IIRC it does not bypass parsing of IDT vectoring info into arch independent structure. VMX code uses vmx-idt_vectoring_info directly. At the risk of sounding blasphemous, I'd like to make the case that perhaps the current nested-VMX design - regarding the IDT-vectoring-info-field handling - is actually closer than nested-SVM to the goal of moving clean nested-supporting logic into x86.c, instead of having ad-hoc, unnatural, workarounds. Let me explain, and see if you agree with my logic: We discover at exit time whether the virtualization hardware (VMX or SVM) exited while *delivering* an interrupt or exception to the current guest. This is known as idt-vectoring-information in VMX. What do we need to do with this idt-vectoring-information? In regular (non- nested) guests, the answer is simple: On the next entry, we need to inject this event again into the guest, so it can resume the delivery of the same event it was trying to deliver. This is why the nested-unaware code has a vmx_complete_interrupts which basically adds this idt-vectoring-info into KVM's event queue, which on the next entry will be injected similarly to the way virtual interrupts from userspace are injected, and so on. But with nested virtualization, this is *not* what is supposed to happen - we do not *always* need to inject the event to the guest. We will only need to inject the event if the next entry will be again to the same guest, i.e., L1 after L1, or L2 after L2. If the idt-vectoring-info came from L2, but our next entry will be into L1 (i.e., a nested exit), we *shouldn't* inject the event as usual, but should rather pass this idt-vectoring-info field as the exit information that L1 gets (in nested vmx terminology, in vmcs12). However, at the time of exit, we cannot know for sure whether L2 will actually run next, because it is still possible that an injection from user space, before the next entry, will cause us to decide to exit to L1. Therefore, I believe that the clean solution isn't to leave the original non-nested logic that always queues the idt-vectoring-info assuming it will be injected, and then if it shouldn't (because we want to exit during entry) we need to skip the entry once as a trick to avoid this wrong injection. Rather, a clean solution is, I think, to recognize that in nested virtualization, idt-vectoring-info is a different kind of beast than regular injected events, and it needs to be saved at exit time in a different field (which will of course be common to SVM and VMX). Only at entry time, after the regular injection code (which may cause a nested exit), we can call a x86_op to handle this special injection. The benefit of this approach, which is closer to the current vmx code, is, I think, that x86.c will contain clear, self-explanatory nested logic, instead of relying on vmx.c or svm.c circumventing various x86.c functions and mechanisms to do something different from what they were meant to do. What do you think? -- Nadav Har'El| Sunday, May 22 2011, 19 Iyyar 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |If I were two-faced, would I be wearing http://nadav.harel.org.il |this one? Abraham Lincoln -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sun, 22 May 2011 15:10:08 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com wrote: Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Do we have more than speculation to back that up, BTW? Need to dig this up: I thought we saw some reports of this on the list? I think so too, but a reference needs to be here too. It helps to have exact benchmarks on what's being tested, otherwise we risk unexpected interaction with the other optimization patches. struct sk_buff *skb; unsigned int len; - - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) { + bool c; + int n; + + /* We try to free up at least 2 skbs per one sent, so that we'll get + * all of the memory back if they are used fast enough. */ + for (n = 0; + ((c = virtqueue_get_capacity(vi-svq) capacity) || n 2) + ((skb = virtqueue_get_buf(vi-svq, len))); + ++n) { pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } + return !c; This is for() abuse :) Why is the capacity check in there at all? Surely it's simpler to try to free 2 skbs each time around? This is in case we can't use indirect: we want to free up enough buffers for the following add_buf to succeed. Sure, or we could just count the frags of the skb we're taking out, which would be accurate for both cases and far more intuitive. ie. always try to free up twice as much as we're about to put in. Can we hit problems with OOM? Sure, but no worse than now... The problem is that this virtqueue_get_capacity() returns the worst case, not the normal case. So using it is deceptive. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] KVM Test: Switch current working folder in unattended_install.py.
On 05/21/2011 02:59 AM, Lucas Meneghel Rodrigues wrote: On Thu, 2011-05-19 at 18:24 +0800, fy...@redhat.com wrote: From: Feng Yangfy...@redhat.com Current working folder for unattended_install_config = UnattendedInstallConfig(test, params) unattended_install_config.setup() must be kvm folder. This is not needed at all. What might be going on your setup is some incorrectly set or absent path that might be messing up with relative paths during your install. Please provide some more info so we can fix your problem properly. Meanwhile I'm marking this as 'rejected'. Thanks! Thanks for your comment. After merge upstream code to our local tree, I found that our local unattended_install could not work. Before, Current working folder for unattended.py is kvm folder, changed in process_command. Now Current working folder for unattended_install_config = UnattendedInstallConfig(test, params) unattended_install_config.setup() changed to case result folder. So our unattended_install always fails at could not find ks.iso. Then I send this patch. I will recheck our local code and configure. Thanks very much! Feng Yang Signed-off-by: Feng Yangfy...@redhat.com --- client/tests/kvm/tests/unattended_install.py |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/unattended_install.py b/client/tests/kvm/tests/unattended_install.py index 50a8c7a..eee1761 100644 --- a/client/tests/kvm/tests/unattended_install.py +++ b/client/tests/kvm/tests/unattended_install.py @@ -506,8 +506,11 @@ def run_unattended_install(test, params, env): @param params: Dictionary with the test parameters. @param env: Dictionary with test environment. +cur_folder = os.getcwd() +os.chdir(test.bindir) unattended_install_config = UnattendedInstallConfig(test, params) unattended_install_config.setup() +os.chdir(cur_folder) vm = env.get_vm(params[main_vm]) vm.create() -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html