Re: trouble with dvd drives

2009-04-11 Thread Carlo Marcelo Arenas Belon
On Sat, Apr 11, 2009 at 11:31:41PM -0500, Gene Horodecki wrote:
> Hi there.. I have a new machine with an AMD Phenom processor and two  
> fairly average LG DVD burners.  I've gotten to the point that I seem to  
> be able to see both but they are not coming up as writable drives..

AFAIK the IDE device emulation from QEMU only supports up to (incomplete)
DVD-ROM profile, hence why support for writting isn't working.

Carlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


trouble with dvd drives

2009-04-11 Thread Gene Horodecki
Hi there.. I have a new machine with an AMD Phenom processor and two 
fairly average LG DVD burners.  I've gotten to the point that I seem to 
be able to see both but they are not coming up as writable drives.. I 
have a data DVD in one drive and a blank in the other.


Can anyone help me out?  here is the kvm command I'm using:

/usr/bin/kvm -M pc -m 512 -smp 1 -name fury -monitor pty -localtime 
-boot c -drive file=/v/fury/fury.img,if=scsi,index=0,boot=on -drive 
file=/dev/scd0,if=ide,media=cdrom -drive 
file=/dev/scd1,if=ide,media=cdrom -net 
nic,macaddr=00:16:36:53:18:96,vlan=0 -net 
tap,vlan=0,script=/v/scripts/kvm-ifup -serial pty -parallel none -usb 
-usbdevice tablet -vnc 0.0.0.0:4 -daemonize


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] deal with interrupt shadow state for emulated instruction

2009-04-11 Thread H. Peter Anvin
Avi Kivity wrote:
>>  
>> -rip = kvm_rip_read(vcpu);
>> -rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
>> -kvm_rip_write(vcpu, rip);
>> +if (!(interruptibility & interruptibility_mask))
>> +vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
>> + interruptibility | interruptibility_mask);
>> +vcpu->arch.interrupt_window_open = 0;
>>   
> 
> Setting both _MOV_SS and _STI is wierd; can't happen on real hardware.
> 

Not at architecturally visible boundaries, for sure.  It can be an
implementation artifact internally to an instruction, though.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Consolidate userspace and kernel interrupt injection for VMX.

2009-04-11 Thread Gleb Natapov
On Sat, Apr 11, 2009 at 02:29:28PM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>> Use the same callback to inject irq/nmi events no matter what irqchip is
>> in use. Only from VMX for now.
>>
>> Signed-off-by: Gleb Natapov 
>> ---
>>
>>  arch/x86/include/asm/kvm_host.h |2 +
>>  arch/x86/kvm/svm.c  |2 +
>>  arch/x86/kvm/vmx.c  |   71 
>> +--
>>  arch/x86/kvm/x86.c  |2 +
>>  4 files changed, 19 insertions(+), 58 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index e672ca5..4e39c40 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -520,7 +520,7 @@ struct kvm_x86_ops {
>>  void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
>>  bool has_error_code, u32 error_code);
>>  bool (*exception_injected)(struct kvm_vcpu *vcpu);
>> -void (*inject_pending_irq)(struct kvm_vcpu *vcpu);
>> +void (*inject_pending_irq)(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>   
>
> kvm_run is available as vcpu->run, so this isn't needed.  But better to  
> keep it for now and drop it in a later patch.
>
>>  static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
>>  {
>>  int ret;
>> @@ -3351,8 +3309,11 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
>> *vmx)
>>  }
>>  }
>>  -static void vmx_intr_assist(struct kvm_vcpu *vcpu)
>> +static void vmx_intr_assist(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  {
>> +bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
>> +kvm_run->request_interrupt_window;
>> +
>>  update_tpr_threshold(vcpu);
>>  vmx_update_window_states(vcpu);
>> @@ -3373,25 +3334,25 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
>>  return;
>>  }
>>  }
>>   
>
> Why not convert the 'enable_nmi_window(); return;' above to a 'goto out'  
> like you do elsewhere?
>
>
Mainly because of the next patch in the series that reworks this code
anyway.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Cleanup vmx_intr_assist()

2009-04-11 Thread Gleb Natapov
On Sat, Apr 11, 2009 at 02:30:30PM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>> Signed-off-by: Gleb Natapov 
>> ---
>>
>>  arch/x86/kvm/vmx.c |   55 
>> 
>>  1 files changed, 30 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 06252f7..9eb518f 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -3309,6 +3309,34 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
>> *vmx)
>>  }
>>  }
>>  +static void vmx_intr_inject(struct kvm_vcpu *vcpu)
>> +{
>> +/* try to reinject previous events if any */
>> +if (vcpu->arch.nmi_injected) {
>> +vmx_inject_nmi(vcpu);
>> +return;
>> +}
>> +
>> +if (vcpu->arch.interrupt.pending) {
>> +vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
>> +return;
>> +}
>> +
>> +/* try to inject new event if pending */
>> +if (vcpu->arch.nmi_pending) {
>> +if (vcpu->arch.nmi_window_open) {
>> +vcpu->arch.nmi_pending = false;
>> +vcpu->arch.nmi_injected = true;
>> +vmx_inject_nmi(vcpu);
>> +}
>> +} else if (kvm_cpu_has_interrupt(vcpu)) {
>> +if (vcpu->arch.interrupt_window_open) {
>> +kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
>> +vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
>> +}
>> +}
>> +}
>> +
>>  static void vmx_intr_assist(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  {
>>  bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
>> @@ -3323,32 +3351,9 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu, 
>> struct kvm_run *kvm_run)
>>  GUEST_INTR_STATE_STI |
>>  GUEST_INTR_STATE_MOV_SS);
>>  -   if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
>> -if (vcpu->arch.interrupt.pending) {
>> -enable_nmi_window(vcpu);
>> -} else if (vcpu->arch.nmi_window_open) {
>> -vcpu->arch.nmi_pending = false;
>> -vcpu->arch.nmi_injected = true;
>> -} else {
>> -enable_nmi_window(vcpu);
>> -return;
>> -}
>> -}
>> -
>> -if (vcpu->arch.nmi_injected) {
>> -vmx_inject_nmi(vcpu);
>> -goto out;
>> -}
>> -
>> -if (!vcpu->arch.interrupt.pending && kvm_cpu_has_interrupt(vcpu)) {
>> -if (vcpu->arch.interrupt_window_open)
>> -kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
>> -}
>> -
>> -if (vcpu->arch.interrupt.pending)
>> -vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
>> +vmx_intr_inject(vcpu);
>>  -out:
>> +/* enable NMI/IRQ window open exits if needed */
>>  if (vcpu->arch.nmi_pending)
>>  enable_nmi_window(vcpu);
>>  else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>   
>
> Not sure I understand the motivation.  Just replace a 'goto out' with a  
> return?
>
Yes. The code is much cleaner this way. It is easy to see that no matter what 
happened
during injection irq/nmi windows is enabled if there are pending events. With 
gotos
and returns scattered over the function you'll need to check every single path 
that may
or may not lead to the window enable code.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Server network went away. Host + Guests not reachable.

2009-04-11 Thread Daniel Kesselberg

Hello,

i have a big problem with my kvm setup. My network went away and i cant
find any informations about the reasons. Its not possible to reach the
Host or the Guest by SSH. I'm using the latest kvm (QEMU PC emulator
version 0.9.1 (kvm-84)) and a setup with a network bridge.

server1:~# dmesg
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.26-1-amd64 (Debian 2.6.26-13lenny2)
(da...@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian
4.1.2-25)) #1 SMP Fri Mar 13 17:46:45 UTC 2009
[0.00] Command line: root=/dev/md1 ro
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e4000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - ddfd (usable)
[0.00]  BIOS-e820: ddfd - ddfde000 (ACPI data)
[0.00]  BIOS-e820: ddfde000 - de00 (ACPI NVS)
[0.00]  BIOS-e820: fff0 - 0001 (reserved)
[0.00]  BIOS-e820: 0001 - 00012000 (usable)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 3200 used
[0.00] Entering add_active_range(0, 256, 909264) 1 entries of
3200 used
[0.00] Entering add_active_range(0, 1048576, 1179648) 2 entries
of 3200 used
[0.00] max_pfn_mapped = 1179648
[0.00] init_memory_mapping
[0.00] DMI present.
[0.00] ACPI: RSDP 000F98A0, 0014 (r0 ACPIAM)
[0.00] ACPI: RSDT DDFD, 003C (r1 M S I  OEMRSDT  1731
MSFT   97)
[0.00] ACPI: FACP DDFD0200, 0084 (r2 M S I  OEMFACP  1731
MSFT   97)
[0.00] ACPI: DSDT DDFD0430, 40D9 (r1  1ADNC 1ADNC0000
INTL 20051117)
[0.00] ACPI: FACS DDFDE000, 0040
[0.00] ACPI: APIC DDFD0390, 005C (r1 M S I  OEMAPIC  1731
MSFT   97)
[0.00] ACPI: MCFG DDFD03F0, 003C (r1 M S I  OEMMCFG  1731
MSFT   97)
[0.00] ACPI: OEMB DDFDE040, 0060 (r1 M S I  AMI_OEM  1731
MSFT   97)
[0.00] ACPI: EPTH DDFD4510, 0038 (r1 M S I  OEMHPET  1731
MSFT   97)
[0.00] ACPI: SSDT DDFD4550, 030E (r1 A M I  POWERNOW1
AMD 1)
[0.00] Scanning NUMA topology in Northbridge 24
[0.00] No NUMA configuration found
[0.00] Faking a node at -00012000
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 3200 used
[0.00] Entering add_active_range(0, 256, 909264) 1 entries of
3200 used
[0.00] Entering add_active_range(0, 1048576, 1179648) 2 entries
of 3200 used
[0.00] Bootmem setup node 0 -00012000
[0.00]   NODE_DATA [e000 - 00012fff]
[0.00]   bootmap [00013000 -  00036fff] pages 24
[0.00]   early res: 0 [0-fff] BIOS data page
[0.00]   early res: 1 [6000-7fff] TRAMPOLINE
[0.00]   early res: 2 [20-675397] TEXT DATA BSS
[0.00]   early res: 3 [37953000-37fefff3] RAMDISK
[0.00]   early res: 4 [9fc00-f] BIOS reserved
[0.00]   early res: 5 [8000-dfff] PGTABLE
[0.00]  [e200-e20002df] PMD ->
[81000120-810003ff] on node 0
[0.00]  [e20002e0-e20003ff] PMD ->
[81000c00-81000cbf] on node 0
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1179648
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[3] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   909264
[0.00] 0:  1048576 ->  1179648
[0.00] On node 0 totalpages: 1040239
[0.00]   DMA zone: 56 pages used for memmap
[0.00]   DMA zone: 1248 pages reserved
[0.00]   DMA zone: 2695 pages, LIFO batch:0
[0.00]   DMA32 zone: 14280 pages used for memmap
[0.00]   DMA32 zone: 890888 pages, LIFO batch:31
[0.00]   Normal zone: 1792 pages used for memmap
[0.00]   Normal zone: 129280 pages, LIFO batch:31
[0.00]   Movable zone: 0 pages used for memmap
[0.00] ATI board detected. Disabling timer routing over 8254.
[0.00] Detected use of extended apic ids on hypertransport bus
[0.00] ACPI: PM-Timer IO Port: 0x808
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 2, version 0, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl d

Server network went away. Host + Guests not reachable.

2009-04-11 Thread Daniel Kesselberg

Hello,

i have a big problem with my kvm setup. My network went away and i cant 
find any informations about the reasons. Its not possible to reach the 
Host or the Guest by SSH. I'm using the latest kvm (QEMU PC emulator 
version 0.9.1 (kvm-84)) and a setup with a network bridge.


server1:~# dmesg
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.26-1-amd64 (Debian 2.6.26-13lenny2) 
(da...@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 
4.1.2-25)) #1 SMP Fri Mar 13 17:46:45 UTC 2009

[0.00] Command line: root=/dev/md1 ro
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e4000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - ddfd (usable)
[0.00]  BIOS-e820: ddfd - ddfde000 (ACPI data)
[0.00]  BIOS-e820: ddfde000 - de00 (ACPI NVS)
[0.00]  BIOS-e820: fff0 - 0001 (reserved)
[0.00]  BIOS-e820: 0001 - 00012000 (usable)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 3200 used
[0.00] Entering add_active_range(0, 256, 909264) 1 entries of 
3200 used
[0.00] Entering add_active_range(0, 1048576, 1179648) 2 entries 
of 3200 used

[0.00] max_pfn_mapped = 1179648
[0.00] init_memory_mapping
[0.00] DMI present.
[0.00] ACPI: RSDP 000F98A0, 0014 (r0 ACPIAM)
[0.00] ACPI: RSDT DDFD, 003C (r1 M S I  OEMRSDT  1731 
MSFT   97)
[0.00] ACPI: FACP DDFD0200, 0084 (r2 M S I  OEMFACP  1731 
MSFT   97)
[0.00] ACPI: DSDT DDFD0430, 40D9 (r1  1ADNC 1ADNC0000 
INTL 20051117)

[0.00] ACPI: FACS DDFDE000, 0040
[0.00] ACPI: APIC DDFD0390, 005C (r1 M S I  OEMAPIC  1731 
MSFT   97)
[0.00] ACPI: MCFG DDFD03F0, 003C (r1 M S I  OEMMCFG  1731 
MSFT   97)
[0.00] ACPI: OEMB DDFDE040, 0060 (r1 M S I  AMI_OEM  1731 
MSFT   97)
[0.00] ACPI: EPTH DDFD4510, 0038 (r1 M S I  OEMHPET  1731 
MSFT   97)
[0.00] ACPI: SSDT DDFD4550, 030E (r1 A M I  POWERNOW1 
AMD 1)

[0.00] Scanning NUMA topology in Northbridge 24
[0.00] No NUMA configuration found
[0.00] Faking a node at -00012000
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 3200 used
[0.00] Entering add_active_range(0, 256, 909264) 1 entries of 
3200 used
[0.00] Entering add_active_range(0, 1048576, 1179648) 2 entries 
of 3200 used

[0.00] Bootmem setup node 0 -00012000
[0.00]   NODE_DATA [e000 - 00012fff]
[0.00]   bootmap [00013000 -  00036fff] pages 24
[0.00]   early res: 0 [0-fff] BIOS data page
[0.00]   early res: 1 [6000-7fff] TRAMPOLINE
[0.00]   early res: 2 [20-675397] TEXT DATA BSS
[0.00]   early res: 3 [37953000-37fefff3] RAMDISK
[0.00]   early res: 4 [9fc00-f] BIOS reserved
[0.00]   early res: 5 [8000-dfff] PGTABLE
[0.00]  [e200-e20002df] PMD -> 
[81000120-810003ff] on node 0
[0.00]  [e20002e0-e20003ff] PMD -> 
[81000c00-81000cbf] on node 0

[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1179648
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[3] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   909264
[0.00] 0:  1048576 ->  1179648
[0.00] On node 0 totalpages: 1040239
[0.00]   DMA zone: 56 pages used for memmap
[0.00]   DMA zone: 1248 pages reserved
[0.00]   DMA zone: 2695 pages, LIFO batch:0
[0.00]   DMA32 zone: 14280 pages used for memmap
[0.00]   DMA32 zone: 890888 pages, LIFO batch:31
[0.00]   Normal zone: 1792 pages used for memmap
[0.00]   Normal zone: 129280 pages, LIFO batch:31
[0.00]   Movable zone: 0 pages used for memmap
[0.00] ATI board detected. Disabling timer routing over 8254.
[0.00] Detected use of extended apic ids on hypertransport bus
[0.00] ACPI: PM-Timer IO Port: 0x808
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 2, version 0, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bu

Re: [RFC PATCH v2 15/19] kvm: add dynamic IRQ support

2009-04-11 Thread Avi Kivity

Gregory Haskins wrote:

This patch provides the ability to dynamically declare and map an
interrupt-request handle to an x86 8-bit vector.

Problem Statement: Emulated devices (such as PCI, ISA, etc) have
interrupt routing done via standard PC mechanisms (MP-table, ACPI,
etc).  However, we also want to support a new class of devices
which exist in a new virtualized namespace and therefore should
not try to piggyback on these emulated mechanisms.  Rather, we
create a way to dynamically register interrupt resources that
acts indepent of the emulated counterpart.

On x86, a simplistic view of the interrupt model is that each core
has a local-APIC which can recieve messages from APIC-compliant
routing devices (such as IO-APIC and MSI) regarding details about
an interrupt (such as which vector to raise).  These routing devices
are controlled by the OS so they may translate a physical event
(such as "e1000: raise an RX interrupt") to a logical destination
(such as "inject IDT vector 46 on core 3").  A dynirq is a virtual
implementation of such a router (think of it as a virtual-MSI, but
without the coupling to an existing standard, such as PCI).

The model is simple: A guest OS can allocate the mapping of "IRQ"
handle to "vector/core" in any way it sees fit, and provide this
information to the dynirq module running in the host.  The assigned
IRQ then becomes the sole handle needed to inject an IDT vector
to the guest from a host.  A host entity that wishes to raise an
interrupt simple needs to call kvm_inject_dynirq(irq) and the routing
is performed transparently.

+static int
+_kvm_inject_dynirq(struct kvm *kvm, struct dynirq *entry)
+{
+   struct kvm_vcpu *vcpu;
+   int ret;
+
+   mutex_lock(&kvm->lock);
+
+   vcpu = kvm->vcpus[entry->dest];
+   if (!vcpu) {
+   ret = -ENOENT;
+   goto out;
+   }
+
+   ret = kvm_apic_set_irq(vcpu, entry->vec, 1);
+
+out:
+   mutex_unlock(&kvm->lock);
+
+   return ret;
+}
+
  


Given that you're using the apic to inject the IRQ, you'll need an EOI.  
So what's the difference between dynirq and MSI, performance wise?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 14/19] kvm: add a reset capability

2009-04-11 Thread Avi Kivity

Gregory Haskins wrote:

We need a way to detect if a VM is reset later in the series, so lets
add a capability for userspace to signal a VM reset down to the kernel.
  


As I mentioned, this won't be reliable.  It needs to be driven from 
userspace and be per-device.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Defer remote tlb flushes on invlpg (v4)

2009-04-11 Thread Andrea Arcangeli
On Sun, Mar 29, 2009 at 01:36:01PM +0300, Avi Kivity wrote:
> Marcelo, Andrea?

Had to read the code a bit more to understand the reason of the
unsync_mmu flush in cr3 overwrite.

> Avi Kivity wrote:
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2a36f7f..f0ea56c 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1184,8 +1184,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>>  for_each_sp(pages, sp, parents, i)
>>  protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
>>  -   if (protected)
>> -kvm_flush_remote_tlbs(vcpu->kvm);
>> +kvm_flush_remote_tlbs_cond(vcpu->kvm, protected);
>>  for_each_sp(pages, sp, parents, i) {
>>  kvm_sync_page(vcpu, sp);

Ok so because we didn't flush the tlb on the other vcpus when invlpg
run, if cr3 overwrite needs to re-sync sptes wrprotecting them, we've
to flush the tlb in all vcpus to be sure the possibly writable tlb
entry reflecting the old writable spte instantiated before invlpg run,
is removed from the physical cpus. We wouldn't find it in for_each_sp
because it was rmap_removed, but we'll find something in
mmu_unsync_walk (right? we definitely have to find something in
mmu_unsync_walk for this to work, the parent sp have to leave
child->unsync set even after rmap_remove run in invlpg without
flushing the other vcpus tlbs).

>>  @@ -465,7 +464,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, 
>> gva_t gva)
>>  rmap_remove(vcpu->kvm, sptep);
>>  if (is_large_pte(*sptep))
>>  --vcpu->kvm->stat.lpages;
>> -need_flush = 1;
>> +vcpu->kvm->remote_tlbs_dirty = true;
>>  }
>>  set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
>>  break;
>> @@ -475,8 +474,6 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
>> gva)
>>  break;
>>  }
>>  -   if (need_flush)
>> -kvm_flush_remote_tlbs(vcpu->kvm);
>>  spin_unlock(&vcpu->kvm->mmu_lock);

AFIK to be compliant with lowlevel archs (without ASN it doesn't
matter I think as vmx always flush on exit), we have to flush the
local tlb here, with set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests). I
don't see why it's missing. Or am I wrong?

>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 68b217e..12afa50 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -758,10 +758,18 @@ static bool make_all_cpus_request(struct kvm *kvm, 
>> unsigned int req)
>>   void kvm_flush_remote_tlbs(struct kvm *kvm)
>>  {
>> +kvm->remote_tlbs_dirty = false;
>> +smp_wmb();

Still no lock prefix to the asm insn and here it runs outside the
mmu_lock, but ok, I tend to agree smp_wmb should be enough to be sure
the write is fully finished by the time smb_wmb returns. There's
another problem though.

CPU0CPU1
--- -
remote_tlbs_dirty = false
remote_tlbs_dirty = true
smp_tlb_flush
set_shadow_pte(sptep, 
shadow_trap_nonpresent_pte);


The flush for the sptep will be lost.

>> @@ -907,8 +913,7 @@ static int kvm_mmu_notifier_clear_flush_young(struct 
>> mmu_notifier *mn,
>>  young = kvm_age_hva(kvm, address);
>>  spin_unlock(&kvm->mmu_lock);
>>  -   if (young)
>> -kvm_flush_remote_tlbs(kvm);
>> +kvm_flush_remote_tlbs_cond(kvm, young);
>>  return young;
>>  }

No need to flush for clear_flush_young method, pages can't be freed
there.

I mangled over the patch a bit, plus fixed the above smp race, let me
know what you think.

The the best workload to exercise this is running a VM with lots of
VCPUs and 8G of ram with a 32bit guest kernel and then just malloc and
touch a byte for each 4096 page allocated by malloc. That will run a
flood of invlpg. Then push the system to swap. while :; do cp /dev/hda
/dev/null; done, also works without O_DIRECT so the host cache make it
fast at the second run (not so much faster with host swapping though).

I only tested it so far with 12 VM on swap with 64bit kernels with
heavy I/O so it's not good test as I doubt any invlpg runs, not even
munmap(addr, 4k) uses invlpg.

Signed-off-by: Andrea Arcangeli 
---

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d5bdf3a..900bc31 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1185,7 +1185,11 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
for_each_sp(pages, sp, parents, i)
protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
 
-   if (protected)
+   /*
+* Avoid leaving stale tlb entries in this vcpu representing
+* sptes rmap_removed by invlpg without vcpu smp tlb flush.
+  

Re: [RFC PATCH v2 00/19] virtual-bus

2009-04-11 Thread Avi Kivity

Gregory Haskins wrote:

Avi,

Gregory Haskins wrote:
  

Todo:
*) Develop some kind of hypercall registration mechanism for KVM so that
   we can use that as an integration point instead of directly hooking
   kvm hypercalls
  



What would you like to see here?  I now remember why I removed the
original patch I had for registration...it requires some kind of
discovery mechanism on its own.  Note that this is hard, but I figured
it would make the overall series simpler if I didn't go this route and
instead just integrated with a statically allocated vector.  That being
said, I have no problem adding this back in but figure we should discuss
the approach so I don't go down a rat-hole ;)

  



One idea is similar to signalfd() or eventfd().  Provide a kvm ioctl 
that takes a gsi and returns an fd.  Writes to the fd change the state 
of the line, possible triggering an interrupt.  Another ioctl takes a 
hypercall number or pio port as well as an existing fd.  Invocations of 
the hypercall or writes to the port write to the fd (using the same 
protocol as eventfd), so the other end can respond.


The nice thing is that this can be used by both kernel and userspace 
components, and for kernel components, hypercalls can be either buffered 
or unbuffered.



So, one thing we could do is use a string-identifier to discover
hypercall resources.  In this model, we would have one additional
hypercall registered with kvm (in addition to the mmu-ops, etc) called
KVM_HC_DYNHC or something like that.  The support for DYNHC could be
indicated in the cpuid (much like I do with the RESET, DYNIRQ, and VBUS
support today.   When hypercall provides register, the could provide a
string such as "vbus", and they would be allocated a hypercall id. 
Likewise, the HC_DYNHC interface would allow a guest to query the cpuid

for the DYNHC feature, and then query the HC_DYNHC vector for a string
to hc# translation.  If the provider is not present, we return -1 for
the hc#, otherwise we return the one that was allocated.

I know how you feel about string-ids in general, but I am not quite sure
how to design this otherwise without it looking eerily similar to what I
already have (which is registering a new HC vector in kvm_para.h)
  


No need for a string ID.  Reserve a range of hypercall numbers for 
dynamic IDs.  Userspace allocates one and gives it to the device using 
its configuration space (as applies to whatever bus it is using).



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] deal with interrupt shadow state for emulated instruction

2009-04-11 Thread Avi Kivity

Glauber Costa wrote:

we currently unblock shadow interrupt state when we skip an instruction,
but failing to do so when we actually emulate one. This blocks interrupts
in key instruction blocks, in particular sti; hlt; sequences

If the instruction emulated is an sti, we have to block shadow interrupts.
The same goes for mov ss. pop ss also needs it, but we don't currently
emulate it. For sequences of two or more instructions of the same type
among those instructions, only the first one has this effect.

Without this patch, I cannot boot gpxe option roms at vmx machines.
This is described at https://bugzilla.redhat.com/show_bug.cgi?id=494469

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -513,6 +513,8 @@ struct kvm_x86_ops {
void (*run)(struct kvm_vcpu *vcpu, struct kvm_run *run);
int (*handle_exit)(struct kvm_run *run, struct kvm_vcpu *vcpu);
void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
+   void (*block_interrupt_shadow)(struct kvm_vcpu *vcpu);
+   void (*unblock_interrupt_shadow)(struct kvm_vcpu *vcpu);
  


set_interrupt_shadow(), clear_interrupt_shadow().  The interrupt shadow 
blocks interrupts, but what happens when you block the interrupt shadow?


Maybe better to fold into one callback with a parameter.


 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt,
-struct x86_emulate_ops *ops);
+struct x86_emulate_ops *ops, int *interruptibility);
  


Add to x86_emulate_ctxt, there's already some

 
+static void svm_block_interrupt_shadow(struct kvm_vcpu *vcpu)

+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
+   vcpu->arch.interrupt_window_open = 0;
+}
+
+static void svm_unblock_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   svm->vmcb->control.int_state &= ~SVM_INTERRUPT_SHADOW_MASK;
+   vcpu->arch.interrupt_window_open = (svm->vcpu.arch.hflags & 
HF_GIF_MASK);
  


If eflags.if = 0, the interrupt window is closed.


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6997c0..5158c2b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -736,26 +736,45 @@ static void vmx_set_rflags(struct kvm_vcpu *vcpu, 
unsigned long rflags)
vmcs_writel(GUEST_RFLAGS, rflags);
 }
 
-static void skip_emulated_instruction(struct kvm_vcpu *vcpu)

+static void vmx_block_interrupt_shadow(struct kvm_vcpu *vcpu)
 {
-   unsigned long rip;
-   u32 interruptibility;
+   u32 interruptibility = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
+   u32 interruptibility_mask = ((GUEST_INTR_STATE_STI | 
GUEST_INTR_STATE_MOV_SS));
 
-	rip = kvm_rip_read(vcpu);

-   rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
-   kvm_rip_write(vcpu, rip);
+   if (!(interruptibility & interruptibility_mask))
+   vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
+interruptibility | interruptibility_mask);
+   vcpu->arch.interrupt_window_open = 0;
  


Setting both _MOV_SS and _STI is wierd; can't happen on real hardware.


 {
unsigned long memop = 0;
u64 msr_data;
@@ -1359,6 +1360,10 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
unsigned int port;
int io_dir_in;
int rc = 0;
+   static int movss_int_flag, movss_int_flag_old;
  


static, for shame.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AW: AW: AW: KVM performance

2009-04-11 Thread Avi Kivity

BRAUN, Stefanie wrote:

Hello,

now I was able to start the guest vmu with disk virtio, and some of the
tests with disk involvement even improved a bit.
But the test in which a logo is added to the video stream does not
improve. I don't know why the performance is so bad? 


Subtest: Reading video locally, adding a logo to the video stream and
then saving the video locally
Host performance:   50%
kvm process in host (top) : 99%
vlc process in vmu (top) :  99%


The output of kvm_stat -1 during the subtest is the following:

efer_reload0 0
exits9913473  3994
  



This indicates that kvm is running in guest mode all of the time and is 
therefore quite efficient.  Perhaps the test uses sse instructions which 
kvm doesn't expose?  Try adding -cpu core2duo to the command line.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add MCE support to KVM

2009-04-11 Thread Avi Kivity

Andi Kleen wrote:
Right, but we can allocate the maximum number, no?  it's a fairly small 
amount of memory.



There are 255 banks worst case which need upto 4 (in theory 5) MSRs each

  


That's 8KB, which would normally not be allocated.  But I think we can 
live with it.


Do we actually need to support 255 banks?  Or can we support a smaller 
number transparently?


(IOW: are the banks interchangable?)

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add MCE support to KVM

2009-04-11 Thread Andi Kleen
> Right, but we can allocate the maximum number, no?  it's a fairly small 
> amount of memory.

There are 255 banks worst case which need upto 4 (in theory 5) MSRs each

-Andi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add MCE support to KVM

2009-04-11 Thread Avi Kivity

Huang Ying wrote:

On Thu, 2009-04-09 at 23:50 +0800, Avi Kivity wrote:
  

Huang Ying wrote:


+int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
+{
+   switch (msr) {
+   case MSR_EFER:
+   set_efer(vcpu, data);
break;
case MSR_IA32_DEBUGCTLMSR:
if (!data) {
@@ -807,6 +828,8 @@ int kvm_set_msr_common(struct kvm_vcpu *
break;
}
default:
+   if (!set_msr_mce(vcpu, msr, data))
+   break;
pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
return 1;
}
  
  

Is there any reason you split kvm_set_msr_common() into two functions?



I want to group MCE related MSR together. And most MCE MSR read/write
need to access vcpu->arch.mcg_xxx or vcpu->arch_mce_banks, So I think
use a MCE specific function would be cleaner.

But It seems that something as follow would be better.

kvm_set_msr_comm()
{
switch (msr) {
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_MISC + 4 * KVM_MCE_MAX_BANK:
set_msr_mce();
break;
...
}
...
}

  


Yes.  Just make sure KVM_MCE_MAX_BANK (better change to KVM_MCE_NR_BANK, 
with MAX you never know if it's the index of the last bank or the number 
of banks) doesn't conflict with any other MSRs.



+
+int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
+{
+   u64 data;
+
+   switch (msr) {
+   case 0xc0010010: /* SYSCFG */
+   case 0xc0010015: /* HWCR */
  
  

Please use MSR_ constants (add them if they don't exist yet).



In fact, this is not added by me. But I can change this by the way.
  


Oh okay.  So don't change them in this patch.


Why not always allocate it on vcpu setup?



Because the MCE bank number is not fixed, it is based on mcg_cap from
user space.
  


Right, but we can allocate the maximum number, no?  it's a fairly small 
amount of memory.


  

+static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
+ struct kvm_x86_mce *mce)
+{
+   u64 mcg_cap = vcpu->arch.mcg_cap;
+   unsigned bank_num = mcg_cap & 0xff;
+   u64 *banks = vcpu->arch.mce_banks;
+
+   if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL))
+   return -EINVAL;
+   /*
+* if IA32_MCG_CTL is not all 1s, the uncorrected error
+* reporting is disabled
+*/
+   if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
+   vcpu->arch.mcg_ctl != ~(u64)0)
+   return 0;
+   banks += 4 * mce->bank;
+   /*
+* if IA32_MCi_CTL is not all 1s, the uncorrected error
+* reporting is disabled for the bank
+*/
+   if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0)
+   return 0;
+   if (mce->status & MCI_STATUS_UC) {
+   u64 status = mce->status;
+   if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
+   !(vcpu->arch.cr4 & X86_CR4_MCE)) {
+   printk(KERN_DEBUG "kvm: set_mce: "
+  "injects mce exception while "
+  "previous one is in progress!\n");
+   set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
+   return 0;
+   }
+   if (banks[1] & MCI_STATUS_VAL)
+   status |= MCI_STATUS_OVER;
+   banks[1] = mce->status;
+   banks[2] = mce->addr;
+   banks[3] = mce->misc;
+   vcpu->arch.mcg_status = mce->mcg_status;
+   kvm_queue_exception(vcpu, MC_VECTOR);
+   } else if (!(banks[1] & MCI_STATUS_VAL) ||
+  (!(banks[1] & MCI_STATUS_UC) &&
+   !((mcg_cap & MCG_TES_P) && ((banks[1]>>53) & 0x3) < 2))) {
+   u64 status = mce->status;
+   if (banks[1] & MCI_STATUS_VAL)
+   status |= MCI_STATUS_OVER;
+   banks[1] = mce->status;
+   banks[2] = mce->addr;
+   banks[3] = mce->misc;
+   } else
+   banks[1] |= MCI_STATUS_OVER;
+   return 0;
+}
  
  

Can userspace just use KVM_SET_MSR for this?



In addition to assigning MSR, we have some other logic for MCE, such as
BANK reporting disabling, overwriting rules, triple fault for UC MCE
during MCIP. So I think we need some dedicate interface.
  


Yes, you're right.

  

+   case KVM_X86_SETUP_MCE: {
+   u64 mcg_cap;
+
+   r = -EFAULT;
+   if (copy_from_user(&mcg_cap, argp, sizeof mcg_cap))
+   goto out;
+   /*
+* extended machine-check state registers and CMCI are
+

Re: deactivate fpu when cr0.PE ON?

2009-04-11 Thread Avi Kivity

Dong, Eddie wrote:

Following code deactivate fpu when CR0.PE is on, any  explaination? 
Rest of code active/deactive fpu based on cr0.TS bit.


  


The idea is, if the guest is doing a context switch, then it will likely 
set its own lazy fpu code, so we anticipate it and deactivate the cpu.


When cr0.pe is cleared, lazy fpu usually doesn't work so we leave it 
enabled.


However the mechanism of deactivating the fpu on cr3 write doesn't work 
with npt or ept, I'd like to replace it with something better.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libkvm: fix build with --with-patched-kernel on 2.6.29

2009-04-11 Thread Avi Kivity

Michael S. Tsirkin wrote:

Make libkvm build with --with-patched-kernel on 2.6.29.

  


Applied, thanks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Cleanup vmx_intr_assist()

2009-04-11 Thread Avi Kivity

Gleb Natapov wrote:

Signed-off-by: Gleb Natapov 
---

 arch/x86/kvm/vmx.c |   55 
 1 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 06252f7..9eb518f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3309,6 +3309,34 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
}
 }
 
+static void vmx_intr_inject(struct kvm_vcpu *vcpu)

+{
+   /* try to reinject previous events if any */
+   if (vcpu->arch.nmi_injected) {
+   vmx_inject_nmi(vcpu);
+   return;
+   }
+
+   if (vcpu->arch.interrupt.pending) {
+   vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+   return;
+   }
+
+   /* try to inject new event if pending */
+   if (vcpu->arch.nmi_pending) {
+   if (vcpu->arch.nmi_window_open) {
+   vcpu->arch.nmi_pending = false;
+   vcpu->arch.nmi_injected = true;
+   vmx_inject_nmi(vcpu);
+   }
+   } else if (kvm_cpu_has_interrupt(vcpu)) {
+   if (vcpu->arch.interrupt_window_open) {
+   kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
+   vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+   }
+   }
+}
+
 static void vmx_intr_assist(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
@@ -3323,32 +3351,9 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
GUEST_INTR_STATE_STI |
GUEST_INTR_STATE_MOV_SS);
 
-	if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {

-   if (vcpu->arch.interrupt.pending) {
-   enable_nmi_window(vcpu);
-   } else if (vcpu->arch.nmi_window_open) {
-   vcpu->arch.nmi_pending = false;
-   vcpu->arch.nmi_injected = true;
-   } else {
-   enable_nmi_window(vcpu);
-   return;
-   }
-   }
-
-   if (vcpu->arch.nmi_injected) {
-   vmx_inject_nmi(vcpu);
-   goto out;
-   }
-
-   if (!vcpu->arch.interrupt.pending && kvm_cpu_has_interrupt(vcpu)) {
-   if (vcpu->arch.interrupt_window_open)
-   kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
-   }
-
-   if (vcpu->arch.interrupt.pending)
-   vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+   vmx_intr_inject(vcpu);
 
-out:

+   /* enable NMI/IRQ window open exits if needed */
if (vcpu->arch.nmi_pending)
enable_nmi_window(vcpu);
else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
  


Not sure I understand the motivation.  Just replace a 'goto out' with a 
return?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Consolidate userspace and kernel interrupt injection for VMX.

2009-04-11 Thread Avi Kivity

Gleb Natapov wrote:

Use the same callback to inject irq/nmi events no matter what irqchip is
in use. Only from VMX for now.

Signed-off-by: Gleb Natapov 
---

 arch/x86/include/asm/kvm_host.h |2 +
 arch/x86/kvm/svm.c  |2 +
 arch/x86/kvm/vmx.c  |   71 +--
 arch/x86/kvm/x86.c  |2 +
 4 files changed, 19 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e672ca5..4e39c40 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -520,7 +520,7 @@ struct kvm_x86_ops {
void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
bool has_error_code, u32 error_code);
bool (*exception_injected)(struct kvm_vcpu *vcpu);
-   void (*inject_pending_irq)(struct kvm_vcpu *vcpu);
+   void (*inject_pending_irq)(struct kvm_vcpu *vcpu, struct kvm_run *run);
  


kvm_run is available as vcpu->run, so this isn't needed.  But better to 
keep it for now and drop it in a later patch.



 static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
 {
int ret;
@@ -3351,8 +3309,11 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
}
 }
 
-static void vmx_intr_assist(struct kvm_vcpu *vcpu)

+static void vmx_intr_assist(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
+   bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
+   kvm_run->request_interrupt_window;
+
update_tpr_threshold(vcpu);
 
 	vmx_update_window_states(vcpu);

@@ -3373,25 +3334,25 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
return;
}
}
  


Why not convert the 'enable_nmi_window(); return;' above to a 'goto out' 
like you do elsewhere?



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Make kvm_cpu_(has|get)_interrupt() work for userspace irqchip too.

2009-04-11 Thread Avi Kivity

Gleb Natapov wrote:

  * check if there are pending timer events
@@ -48,14 +49,17 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 {
struct kvm_pic *s;

-   if (kvm_apic_has_interrupt(v) == -1) {  /* LAPIC */
-   if (kvm_apic_accept_pic_intr(v)) {
-   s = pic_irqchip(v->kvm); /* PIC */
-   return s->output;
-   } else
-   return 0;
+   if (irqchip_in_kernel(v->kvm)) {
+   if (kvm_apic_has_interrupt(v) == -1) {  /* LAPIC */
+   if (kvm_apic_accept_pic_intr(v)) {
+   s = pic_irqchip(v->kvm); /* PIC */
+   return s->output;
+   } else
+   return 0;
+   }
+   return 1;
}
-   return 1;
+   return v->arch.irq_summary;
 }
  
Use if (!irqchip_in_kernel(v->kvm)) for userspace seems more simple(rather 
than a series of indention...).




As long as lines are smaller then 80 chars I don't care much :) If new
version of the patch will be needed I'll change this.
  


Please change it, the less indentation levels the better.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] align vga rom to 4k boundary.

2009-04-11 Thread Avi Kivity

Glauber Costa wrote:

Instead of aligning to 2k boundary, as required by the bios,
align to 4k boundary, as required by kvm memory functions. Without
this patch, starting kvm with -vga std option fails with:

create_userspace_phys_mem: Invalid argument
kvm_cpu_register_physical_memory: failed

as described by: https://bugzilla.redhat.com/show_bug.cgi?id=494376

It does not fail with cirrus vga, because it is naturally aligned.
This problem does not seem to affect upstream qemu.

 exit(1);
 }
/* Round up vga bios size to the next 2k boundary */
-   vga_bios_size = (vga_bios_size + 2047) & ~2047;
+   vga_bios_size = (vga_bios_size + 4095) & ~4095;
option_rom_start = 0xc + vga_bios_size;
 
 /* setup basic memory access */
  


Comment needs updating.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] propagate errors on failed migration.

2009-04-11 Thread Avi Kivity

Glauber Costa wrote:

We're currently ignoring any errors if dirty logging fails.
Set error on migration file if we're unable to put dirty logging
on.
  


Applied, thanks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] Add lcall decoding.

2009-04-11 Thread Avi Kivity

Gleb Natapov wrote:

No emulation yet.

Signed-off-by: Gleb Natapov 
---

 arch/x86/kvm/x86_emulate.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index c015063..fe0dec2 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -154,7 +154,8 @@ static u32 opcode_table[256] = {
/* 0x90 - 0x97 */
DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg,
/* 0x98 - 0x9F */
-   0, 0, 0, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0,
+   0, 0, SrcImm | Src2Imm16 | ImplicitOps, 0, ImplicitOps | Stack,
+   ImplicitOps | Stack, 0, 0,
/* 0xA0 - 0xA7 */
ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs,
ByteOp | DstMem | SrcReg | Mov | MemAbs, DstMem | SrcReg | Mov | MemAbs,

  


Please keep 4 or 8 opcodes per line.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/9] Completely decode in/out at decoding stage.

2009-04-11 Thread Avi Kivity

Gleb Natapov wrote:

Signed-off-by: Gleb Natapov 
---

 arch/x86/kvm/x86_emulate.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 3c23af0..cf27e62 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -193,8 +193,10 @@ static u32 opcode_table[256] = {
0, 0, 0, 0, 0, 0, 0, 0,
/* 0xE0 - 0xE7 */
0, 0, 0, 0,
-   SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
-   SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
+   SrcNone | ByteOp | SrcImmByte | ImplicitOps,
+   SrcNone | SrcImmByte | ImplicitOps,
+   SrcNone | ByteOp | SrcImmByte | ImplicitOps,
+   SrcNone | SrcImmByte | ImplicitOps,
  


SrcImmByte sign extends, but you want zero extension here.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Fix overlapping check for memory slots

2009-04-11 Thread Jan Kiszka
This nice little buglet complicates a smarter slot management in qemu
user space just "slightly". Sigh...

>

When checking for overlapping slots on registration of a new one, kvm
currently also considers zero-length (ie. deleted) slots and rejects
requests incorrectly. This finally denies user space from joining slots.
Fix the check by skipping deleted slots.

Signed-off-by: Jan Kiszka 
---

 virt/kvm/kvm_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 363af32..18f06d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1117,7 +1117,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
struct kvm_memory_slot *s = &kvm->memslots[i];
 
-   if (s == memslot)
+   if (s == memslot || !s->npages)
continue;
if (!((base_gfn + npages <= s->base_gfn) ||
  (base_gfn >= s->base_gfn + s->npages)))



signature.asc
Description: OpenPGP digital signature


Re: kvm-84: Nested virtualization, crashes, and kvm binary name

2009-04-11 Thread Alexander Graf


On 10.04.2009, at 19:08, Mihail Panev wrote:



 Hi Alex,

Alexander Graf wrote:

So question #1: Is this the right thing to start, and if yes, what's
the story behind that name? I ran across some qemu-system-i386 on
google, but my compile did not produce such a binary.


Yes, the binary is called "qemu-system-x86_64". The story behind all
this is somewhere in the archive of this ML.


 Yeah, I had already checked that out before I posted, but what I  
found

only seems to clarify the difference between qemu- and
qemu-system-. What baffles me most is why the heck does it
generate a x86_64 target on a i686 system?!

 Or has KVM dropped support for 32-bit x86 as a guest platform
altogether, relying on x86_64's legacy/compatibility mode? That would
make sense, but on the other hand it could mislead into thinking that
you can run a 64-bit guest on it, which wouldn't work when the host is
actually 32-bit (like in my case). Or do I miss something here?


Please don't question the usefulness of the decision. It is called  
"qemu-system-x86_64" for both i386 and x86_64 alike.



This will probably go away
when qemu and kvm-userspace merge some day.


 I actually thought that had already happened, after I read the
changelog for kvm-84. Now I see that I must have interpreted "merge
qemu-svn" the other way round as it was meant :-)


kvm-userspace merges in upstream qemu changes from time to time. Also  
some developers try to push their changes in kvm-userspace to upstream  
qemu. The process is not finished yet though, as you still have 2  
distinct trees.


I agree that this is not exactly obvious, but I wanted to guard the  
code

as heavily as possible :-).


 Guard it? Is that still experimental? I thought this was enabled by
default...


Yes, it's experimental.


 Actually it would have been more obvious if there was a manpage
documenting that. However, my build did not produce any manpages,
although the configure script said "Manual directory /usr/local/ 
share/man".


 Even if so, I think that needing to set the module parameter
explicitly is enough of a safeguard. Apropos parameter, I'd rather  
name
it something more self-explanatory like "nested_virt" or  
"nested_svm" or

something. Most people, including me, usually associate "nested" with
NPT, when it comes to virtualization.


I was imagining a world where you would always have the feature  
enabled and choose its activation via -cpu. Per default -cpu is set to  
"compatible" - a CPU that can be migrated even across intel and AMD  
platforms. If you choose -cpu barcelona or so you get SVM features in  
the guest.


For now it's the way you see it though and will be until I find time  
to make the code work perfectly ;-).



 And while we are at parameters, kvm-intel.ko's enable/disable
parameters are bool, which actually makes more sense. I think it
wouldn't hurt to make that consistent across modules, i.e. either turn
amd's to bool, or intel's to int.


Also, keep in mind that for now only KVM in KVM works for me. Getting
for example Xen running should be definitely doable - it just didn't
work for me last time I tried. Speed is not exactly great yet either.


 After I "discovered" the -enable-nesting thing, I tried it and it
worked fine. More precisely, I was able to start another guest within
the guest, and it ran OK in text mode. As soon as it booted gdm, it
hanged. I used the cirrus vga for the nested guest. The (outer) guest
was a standard Debian Lenny, so the nested guest used the kvm version
shipped with that, which is kvm-72. That actually shouldn't matter,  
but

who knows...

 At that point, I somehow managed to shoot off the sshd, since I was
doing that over SSH to my university machine. Now the remote machine
seems to run, but the SSH daemon is down. Thus, I cannot reconnect to
make further tests. Due to Easter holidays, I have no physical  
access to
the machine right now, so I will be able to follow up on the matter  
from

Tuesday onwards.

 Btw, SDL performance over SSH X-forwarding just plain sucks, even  
on a

100MBit network.


Yep - there was a discussion on qemu-devel about this.


VNC is a bit better, but there I have an issue with a
"double mouse pointer", having the local one as a black dot and the
remote one the usual way, and they get terribly out of sync all the
time. It's an enormous PITA! Do you experience that too? It happens  
here
no matter how I set the "render mouse pointer locally" setting in  
xvnc.


This is because you have several conversion layers here. Normal VNC  
just sets the mouse pointer on the remote display. With a VM you  
emulate a normal mouse though, that only knows things like "move left  
by 20". That means you're stuck with a relative input device, which  
will always bring you the "double mouse cursor" effect.


If you want to have an absolute input device, try to use "vmmouse"  
instead of "mouse" as input driver in your xorg.conf.


PS: I'm offline for 2 weeks now, so don't expect any reply