Linux KVM on CentOS 5.6

2011-09-21 Thread Kaushal Shriyan
Hi,

I have IBM x3650M3 Server with 2 Physical CPU each with quad core
configs and 32GB of physical RAM and 600 GB usable space. I am using
linux kvm for this purpose. Please correct or suggest me if i am doing
wrong

I have installed CentOS 5.6 on physical server as host OS. I need to
create 2 Virtual Machines with similar configs.

virt-install --connect qemu:///system -n vm1 -r 15000 --cpuset=auto -f
/var/tmp/vm1.qcow2 -s 200 -c
/var/lib/libvirt/images/CentOS-5.6-x86_64-bin-DVD-1of2.iso --vnc
--noautoconsole --os-type linux --os-variant generic26 --accelerate
--network=bridge:br0 --hvm

virt-install --connect qemu:///system -n vm2 -r 15000 --cpuset=auto -f
/var/tmp/vm2.qcow2 -s 200 -c
/var/lib/libvirt/images/CentOS-5.6-x86_64-bin-DVD-1of2.iso --vnc
--noautoconsole --os-type linux --os-variant generic26 --accelerate
--network=bridge:br0 --hvm

Please let me know the recommended method if there are any changes
need to be done in the above command line. I would be eventually
running asterisk PBX application on these 2 VM's and also let me know
if 2 GB of physical RAM is sufficient for Host OS.

Regards,

Kaushal
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-21 Thread Zhi Yong Wu
On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
 On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
  On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti mtosa...@redhat.com 
  wrote:
   On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
   Note:
        1.) When bps/iops limits are specified to a small value such as 
   511 bytes/s, this VM will hang up. We are considering how to handle 
   this senario.
  
   You can increase the length of the slice, if the request is larger than
   slice_time * bps_limit.
  Yeah, but it is a challenge for how to increase it. Do you have some nice 
  idea?
 
  If the queue is empty, and the request being processed does not fit the
  queue, increase the slice so that the request fits.
 Sorry for late reply. actually, do you think that this scenario is
 meaningful for the user?
 Since we implement this, if the user limits the bps below 512
 bytes/second, the VM can also not run every task.
 Can you let us know why we need to make such effort?

 It would be good to handle request larger than the slice.
Below is the code changes for your way. I used simple trace and did dd
test on guest, then found only the first rw req is handled, and
subsequent reqs are enqueued. After several minutes, guest prints the
info below on its terminal:
INFO: task kdmflush:326 blocked for more than 120 seconds.
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.

I don't make sure if it is correct. Do you have better way to verify it?


 It is not strictly necessary, but in case its not handled, a minimum
 should be in place, to reflect maximum request size known. Being able to
 specify something which crashes is not acceptable.



diff --git a/block.c b/block.c
index af19784..f88c22a 100644
--- a/block.c
+++ b/block.c
@@ -132,9 +132,10 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 bs-block_timer = NULL;
 }

-bs-slice_start = 0;
-
-bs-slice_end   = 0;
+bs-slice_time= 0;
+bs-slice_start   = 0;
+bs-slice_end = 0;
+bs-first_time_rw = false;
 }

 static void bdrv_block_timer(void *opaque)
@@ -151,9 +152,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
 bs-block_queue = qemu_new_block_queue();
 bs-block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);

+bs-slice_time  = BLOCK_IO_SLICE_TIME;
 bs-slice_start = qemu_get_clock_ns(vm_clock);
-
-bs-slice_end   = bs-slice_start + BLOCK_IO_SLICE_TIME;
+bs-slice_end   = bs-slice_start + bs-slice_time;
+bs-first_time_rw = true;
 }

 bool bdrv_io_limits_enabled(BlockDriverState *bs)
@@ -2846,11 +2848,23 @@ static bool
bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
 /* Calc approx time to dispatch */
 wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;

-if (wait) {
-*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-}
+if (!bs-first_time_rw
+|| !qemu_block_queue_is_empty(bs-block_queue)) {
+if (wait) {
+*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+}

-return true;
+return true;
+} else {
+bs-slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+bs-slice_end += bs-slice_time - BLOCK_IO_SLICE_TIME;
+if (wait) {
+*wait = 0;
+}
+
+bs-first_time_rw = false;
+return false;
+}
 }

 static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
@@ -2895,11 +2909,23 @@ static bool
bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
 wait_time = 0;
 }

-if (wait) {
-*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-}
+if (!bs-first_time_rw
+|| !qemu_block_queue_is_empty(bs-block_queue)) {
+if (wait) {
+*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+}

-return true;
+return true;
+} else {
+bs-slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+bs-slice_end += bs-slice_time - BLOCK_IO_SLICE_TIME;
+if (wait) {
+*wait = 0;
+}
+
+bs-first_time_rw = false;
+return false;
+}
 }

 static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
@@ -2912,10 +2938,10 @@ static bool
bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
 now = qemu_get_clock_ns(vm_clock);
 if ((bs-slice_start  now)
  (bs-slice_end  now)) {
-bs-slice_end = now + BLOCK_IO_SLICE_TIME;
+bs-slice_end = now + bs-slice_time;
 } else {
 bs-slice_start = now;
-bs-slice_end   = now + BLOCK_IO_SLICE_TIME;
+bs-slice_end   = now + bs-slice_time;

 bs-io_disps.bytes[is_write]  = 0;
 bs-io_disps.bytes[!is_write] = 0;
diff --git a/block/blk-queue.c b/block/blk-queue.c
index adef497..04e52ad 100644
--- 

Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-21 Thread Stefan Hajnoczi
On Tue, Sep 20, 2011 at 06:49:49PM +0200, Jan Kiszka wrote:
 Upstream's version is about to be signal-free and will stop handling
 SIGUSR2 specially. So it's time to adopt its implementation, ie. switch
 from signalfd to a pipe.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 This should help pulling upstream into qemu-kvm when block: avoid
 SIGUSR2 is merged. And will help merging further cleanups of this code
 I'm working on.
 
  posix-aio-compat.c |   73 ++-
  1 files changed, 32 insertions(+), 41 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Perhaps qemu_eventfd() can be used in the future instead of an explicit
pipe.  Then Linux will do eventfd while other OSes will fall back to
pipes.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-21 Thread Kevin Wolf
Am 20.09.2011 18:49, schrieb Jan Kiszka:
 Upstream's version is about to be signal-free and will stop handling
 SIGUSR2 specially. So it's time to adopt its implementation, ie. switch
 from signalfd to a pipe.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 This should help pulling upstream into qemu-kvm when block: avoid
 SIGUSR2 is merged. And will help merging further cleanups of this code
 I'm working on.

Acked-by: Kevin Wolf kw...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-21 Thread Avi Kivity

On 09/20/2011 08:28 PM, Avi Kivity wrote:

On 09/20/2011 07:30 PM, Marcelo Tosatti wrote:

 
We do have a small issue.  If we exit during 
NMI-blocked-by-STI and
nmi_pending == 2, then we lose the second interrupt.  Should 
rarely
happen, since external interrupts never exit in that 
condition, but

it's a wart.
 
 And the above system reset case, you should be able to handle it by
 saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).

  We could just add a KVM_CAP (and flag) that extends nmi_pending from
  a bool to a counter.

Or just add a new field to the pad.



Okay; I'll address this in a follow-on patch (my preference is making 
nmi_pending a counter).




Yet another way to do this is to redefine .injected (just in the API) to 
mean: inject immediately, unless blocked by interrupt shadow; in this 
case inject in the next instruction.  No KVM_CAP or anything.


The drawback is that if we hit the corner case of two NMIs queued and 
held back by interrupt shadow, an older kvm live migration target will 
not run the guest (exit with invalid state).  The advantage is that no 
user space or API changes are necessary.


Given that to get into this corner you need an NMI intensive load AND a 
sti; blah pair that spans two pages AND have the second page unavailable 
when those NMIs hit, I think it's better to avoid the API change.  Opinions?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Using vmdk image with qemu-kvm

2011-09-21 Thread fuzzy_4711
Hi,

I am new to the list, so please guide me to the correct place to ask if
my issue is misplaced here.


My system is an openSuSE 11.4 box with
kvm-0.14.0.0-1.10.1.x86_64
libvirt-client-0.8.8-0.12.1.x86_64


libvirt-devel-0.8.8-0.12.1.x86_64


virt-manager-0.8.5-8.1.x86_64



   libvirt-0.8.8-0.12.1.x86_64


libvirt-doc-0.8.8-0.12.1.x86_64




libvirt-python-0.8.8-0.12.1.x86_64


virt-utils-1.1.5-1.2.1.x86_64

installed on it. Additional configuration info will be provided, if needed.

I have to run a vmware virtual machine, which runs smoothly under win xp
using vmware player (created and first run using vmware server), on my
linux box, using qemu-kvm.

This is the content of the vmx file:



.encoding = windows-1252
config.version = 8
virtualHW.version = 4
scsi0.present = TRUE
memsize = 512
scsi0:0.present = TRUE
scsi0:0.fileName = Windows2000AdvancedServer.vmdk
ide1:0.present = TRUE
ide1:0.fileName = D:
ide1:0.deviceType = atapi-cdrom
floppy0.fileName = A:
Ethernet0.present = TRUE
displayName = Win2K Adv. Server
guestOS = win2000advserv
priority.grabbed = normal
priority.ungrabbed = normal

scsi0:0.redo = 
ethernet0.addressType = generated
uuid.location = 56 4d 5d 73 ac 49 1e a6-db 7f 09 c7 02 13 35 63
uuid.bios = 56 4d 5d 73 ac 49 1e a6-db 7f 09 c7 02 13 35 63
ide1:0.autodetect = FALSE
ethernet0.generatedAddress = 00:0c:29:13:35:63
ethernet0.generatedAddressOffset = 0

annotation = for this virtual machine

ide1:0.startConnected = TRUE
tools.syncTime = TRUE

extendedConfigFile = Windows 2000 Advanced Server.vmxf
virtualHW.productCompatibility = hosted
tools.upgrade.policy = manual

vmotion.checkpointFBSize = 16777216

ide1:0.allowGuestConnectionControl = FALSE

usb.present = TRUE

usb.autoConnect.device0 = 

usb.autoConnect.device1 = 
usb.autoConnect.device2 = 
checkpoint.vmState = 
cleanShutdown = TRUE
replay.supported = FALSE
replay.filename = 



When I use the vm-install script provided by SUSE, choosing the disk
like provided in the vmx file for use with kvm, in the pre flight check
it complains about no valid boot sector found. Installation may have
failed.

I am not an expert on virtualisation as you can see. It is important to
me to be able to rerun this image in vmware player again if needed. So I
guess converting it is not an option afaics. Could you help?

Thanks.
fuz

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] nVMX: Fix warning-causing idt-vectoring-info behavior

2011-09-21 Thread Nadav Har'El
This patch solves two outstanding nested-VMX issues:

1. unexpected, valid vectoring info warnings appeared in L1.
These are fixed by correcting the emulation of concurrent L0-L1 and
L1-L2 injections.

2. When we must run L2 next (e.g., on L1's VMLAUNCH/VMRESUME), injection into
L1 was delayed for an unknown amount of time - until L2 exits.
We now force (using a self IPI) an exit immediately after entry to L2,
so that the injection into L1 happens promptly.

Some more details about these issues:

When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
nVMX patch 23, titled Correct handling of interrupt injection.

Unfortunately, it is possible (though rare) that at this point there is valid
idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
and when L2 tried to run this interrupt's handler, it got a page fault - so
it returns the original interrupt vector in idt_vectoring_info. The problem
is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
like we wished to, because the VMX spec guarantees that idt_vectoring_info
and exit_reason_external_interrupt can never happen together. This is not
just specified in the spec - a KVM L1 actually prints a kernel warning
unexpected, valid vectoring info if we violate this guarantee, and some
users noticed these warnings in L1's logs.

In order to better emulate a processor, which would never return the external
interrupt and the idt-vectoring-info together, we need to separate the two
injection steps: First, complete L1's injection into L2 (i.e., enter L2,
injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
Most of this is already in the code - the only change we need is to remain
in L2 (and not exit to L1) in this case.

However, to ensure prompt injection to L1, instead of letting L2 run for
a while after entering, we can send a self-IPI, which will ensure that
L2 exits immediately after the (necessary) entry, so we can inject into L1
as soon as possible.

Note that we added this self-IPI not only in the idt-vectoring-info case
above, but in every case where we are forced to enter L2 despite wishing to
inject to L1. This includes the case when L1 just VMLAUNCH/VMRESUMEed L2.

Note how we test vmcs12-idt_vectoring_info_field; This isn't really the
vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
of this field. This was explained in patch 25 (Correct handling of idt
vectoring info) of the original nVMX patch series.

Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
to Abel Gordon for helping me figure out the solution, and to Avi Kivity
for helping to improve it.

Signed-off-by: Nadav Har'El n...@il.ibm.com
---
 arch/x86/kvm/vmx.c |   20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

--- .before/arch/x86/kvm/vmx.c  2011-09-21 13:45:59.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-09-21 13:45:59.0 +0300
@@ -3858,12 +3858,17 @@ static bool nested_exit_on_intr(struct k
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
u32 cpu_based_vm_exec_control;
-   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu))
-   /* We can get here when nested_run_pending caused
-* vmx_interrupt_allowed() to return false. In this case, do
-* nothing - the interrupt will be injected later.
+   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) {
+   /*
+* We get here if vmx_interrupt_allowed() returned 0 because
+* we must enter L2 now, so we can't inject to L1 now. If we
+* just do nothing, L2 will later exit and we can inject the
+* IRQ to L1 then. But to make L2 exit more promptly, we send
+* a self-IPI, causing L2 to exit right after entry.
 */
+   smp_send_reschedule(vcpu-cpu);
return;
+   }
 
cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
@@ -3990,11 +3995,12 @@ static void vmx_set_nmi_mask(struct kvm_
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) {
-   struct vmcs12 *vmcs12;
-   if (to_vmx(vcpu)-nested.nested_run_pending)
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+   if (to_vmx(vcpu)-nested.nested_run_pending ||
+   (vmcs12-idt_vectoring_info_field 
+VECTORING_INFO_VALID_MASK))
return 0;
nested_vmx_vmexit(vcpu);
-   vmcs12 = get_vmcs12(vcpu);

Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 01:39:22AM -0400, Amos Kong wrote:
 - Original Message -
  On Tue, Sep 20, 2011 at 06:45:57AM -0400, Amos Kong wrote:
   From 48ea1c9188334b89a60b4f9e853e86fc04fda4a5 Mon Sep 17 00:00:00
   2001
   From: Amos Kong ak...@redhat.com
   Date: Tue, 20 Sep 2011 15:38:43 +0800
   Subject: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI
   DSDT tables
   
   Only func 0 is registered to guest driver (we can
   only found func 0 in slot-funcs list of driver),
   the other functions could not be cleaned when
   hot-removing the whole slot. This patch adds
   device per function in ACPI DSDT tables.
   
   Have tested with linux/winxp/win7, hot-adding/hot-remving,
   single/multiple function device, they are all fine.
   ---
   Changes from v1:
   - cleanup the macros, bios.bin gets back to 128K
   - notify only when func0 is added and removed
  
  How about moving code into functions so that it isn't duplicated for
  each PCI device.  See the patch below as an example (100% untested).

Hmm, I sent patches that did a similar thing but
in a slightly more compact way.
Message ids:
20110919092932.gb4...@redhat.com
20110919093644.gc4...@redhat.com
20110919100434.ga6...@redhat.com

Did they not reach you or something's wrong with them?

 
 Tested, it works as my original patch-v2.
 
  The CPU hotplug stuff works this way, except it run-time generates
  the
  equivalent of Device(S???) and PCNT.  (It may make sense to
  run-time generate the PCI hotplug as well - it consumes about 10K of
  space to statically generate the 31 devices.)
 
 I'm ok with the new version.
 
 Acked-by: Amos Kong ak...@redhat.com
 
  -Kevin
  
  
  diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
  index 08412e2..31ac5eb 100644
  --- a/src/acpi-dsdt.dsl
  +++ b/src/acpi-dsdt.dsl
  @@ -128,48 +128,6 @@ DefinitionBlock (
   PCRM, 32,
   }
   
  -#define hotplug_slot(name, nr) \
  -Device (S##name) {\
  -   Name (_ADR, nr##)  \
  -   Method (_EJ0,1) {  \
  -Store(ShiftLeft(1, nr), B0EJ) \
  -Return (0x0)  \
  -   }  \
  -   Name (_SUN, name)  \
  -}
  -
  -   hotplug_slot(1, 0x0001)
  -   hotplug_slot(2, 0x0002)
  -   hotplug_slot(3, 0x0003)
  -   hotplug_slot(4, 0x0004)
  -   hotplug_slot(5, 0x0005)
  -   hotplug_slot(6, 0x0006)
  -   hotplug_slot(7, 0x0007)
  -   hotplug_slot(8, 0x0008)
  -   hotplug_slot(9, 0x0009)
  -   hotplug_slot(10, 0x000a)
  -   hotplug_slot(11, 0x000b)
  -   hotplug_slot(12, 0x000c)
  -   hotplug_slot(13, 0x000d)
  -   hotplug_slot(14, 0x000e)
  -   hotplug_slot(15, 0x000f)
  -   hotplug_slot(16, 0x0010)
  -   hotplug_slot(17, 0x0011)
  -   hotplug_slot(18, 0x0012)
  -   hotplug_slot(19, 0x0013)
  -   hotplug_slot(20, 0x0014)
  -   hotplug_slot(21, 0x0015)
  -   hotplug_slot(22, 0x0016)
  -   hotplug_slot(23, 0x0017)
  -   hotplug_slot(24, 0x0018)
  -   hotplug_slot(25, 0x0019)
  -   hotplug_slot(26, 0x001a)
  -   hotplug_slot(27, 0x001b)
  -   hotplug_slot(28, 0x001c)
  -   hotplug_slot(29, 0x001d)
  -   hotplug_slot(30, 0x001e)
  -   hotplug_slot(31, 0x001f)
  -
   Name (_CRS, ResourceTemplate ()
   {
   WordBusNumber (ResourceProducer, MinFixed, MaxFixed,
   PosDecode,
  @@ -762,6 +720,119 @@ DefinitionBlock (
   Zero   /* reserved */
   })
   
  +/* PCI hotplug */
  +Scope(\_SB.PCI0) {
  +/* Methods called by bulk generated PCI devices below */
  +Method (PCEJ, 1, NotSerialized) {
  +// _EJ0 method - eject callback
  +Store(ShiftLeft(1, Arg0), B0EJ)
  +Return (0x0)
  +}
  +
  +/* Bulk generated PCI hotplug devices */
  +#define hotplug_func(nr, fn)\
  +Device (S##nr##fn) {\
  +Name (_ADR, 0x##nr##000##fn)\
  +Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
  +Name (_SUN, 0x##nr) \
  +}

The fundamental question is still why would
we have _EJ0 methods in functions 0 when they are
not individually hotpluggable.
I think only function 0 should have _EJ0.

  +
  +#define hotplug_slot(nr) \
  +hotplug_func(nr, 0)  \
  +hotplug_func(nr, 1)  \
  +hotplug_func(nr, 2)  \
  +hotplug_func(nr, 3)  \
  +hotplug_func(nr, 4)  \
  +hotplug_func(nr, 5)  \
  +hotplug_func(nr, 6)  \
  +hotplug_func(nr, 7)
  +
  +   hotplug_slot(01)
  +   hotplug_slot(02)
  +   hotplug_slot(03)
  +   hotplug_slot(04)
  +   hotplug_slot(05)
  +   hotplug_slot(06)
 

Re: [PATCH] nVMX: Fix warning-causing idt-vectoring-info behavior

2011-09-21 Thread Avi Kivity

On 09/21/2011 01:48 PM, Nadav Har'El wrote:

This patch solves two outstanding nested-VMX issues:


snip

Sorry, I missed an important point on the first review.


--- .before/arch/x86/kvm/vmx.c  2011-09-21 13:45:59.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-09-21 13:45:59.0 +0300
@@ -3858,12 +3858,17 @@ static bool nested_exit_on_intr(struct k
  static void enable_irq_window(struct kvm_vcpu *vcpu)
  {
u32 cpu_based_vm_exec_control;
-   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu))
-   /* We can get here when nested_run_pending caused
-* vmx_interrupt_allowed() to return false. In this case, do
-* nothing - the interrupt will be injected later.
+   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) {
+   /*
+* We get here if vmx_interrupt_allowed() returned 0 because
+* we must enter L2 now, so we can't inject to L1 now. If we
+* just do nothing, L2 will later exit and we can inject the
+* IRQ to L1 then. But to make L2 exit more promptly, we send
+* a self-IPI, causing L2 to exit right after entry.
 */
+   smp_send_reschedule(vcpu-cpu);
return;
+   }


-enable_irq_window() is called with interrupts enabled, so the self-IPI 
will be dispatched immediately and be lost.


The way to handle it is to kvm_make_request(KVM_REQ_IMMEDIATE_EXIT) 
here, and to add code to vcpu_enter_guest() to sample this into a local 
variable, and, after disabling interrupts, do the self-IPI.


A unit test would have caught this...

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-21 Thread Michael S. Tsirkin
Here's a bug: guest thinks it can eject VGA device and ISA bridge.

[root@dhcp74-172 ~]#lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 PCI bridge: Red Hat, Inc. Device 0001
00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device
00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device
01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/
adapter  address  attention  latch  module  power
[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/
adapter  address  attention  latch  module  power

[root@dhcp74-172 ~]# echo 0  /sys/bus/pci/slots/2/power 
[root@dhcp74-172 ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:03.0 PCI bridge: Red Hat, Inc. Device 0001
00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device
00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device
01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

This is wrong because slots 1 and 2 are marked as not hotpluggable
in qemu.

The reason is that our acpi tables declare both _RMV with value 0,
and _EJ0 method for these slots. What happens in this case
is undocumented by ACPI spec, so linux ignores _RMV,
and windows seems to ignore _EJ0.

The correct way to suppress hotplug is not to have _EJ0,
so this is what this patch does: it probes PIIX and
modifies DSDT to match.

With these patches applied, we get:

[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/
address
[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/
address

I had to add a bit of compile-time infrastructure to handle the
runtime patching in a simple way. I expect that
it will be possible to reuse it if we need to
patch other methods in the future.

Michael S. Tsirkin (4):
  acpi: generate mixed asl/aml listing
  acpi: add aml/asl parsing script
  acpi: EJ0 method name patching
  acpi: remove _RMV

 Makefile  |   10 ++--
 src/acpi-dsdt.dsl |   58 +++
 src/acpi.c|   11 
 src/find_ej0.pl   |  136 +
 src/splitdsl.pl   |   16 ++
 5 files changed, 176 insertions(+), 55 deletions(-)
 create mode 100755 src/find_ej0.pl
 create mode 100755 src/splitdsl.pl

-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] acpi: generate mixed asl/aml listing

2011-09-21 Thread Michael S. Tsirkin
Use iasl -l flag to produce a mixed listing, where a
source line is followed by matching AML.
Since we use a preprocessed input, this generates long lines with
multiple commands per line. To make it possible to match
AML to source exactly, split the input that we supply iasl,
with each command on a separate line.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile|7 ---
 src/splitdsl.pl |   16 
 2 files changed, 20 insertions(+), 3 deletions(-)
 create mode 100755 src/splitdsl.pl

diff --git a/Makefile b/Makefile
index 109091b..5c011bb 100644
--- a/Makefile
+++ b/Makefile
@@ -192,10 +192,11 @@ $(OUT)vgabios.bin: $(OUT)vgabios.bin.raw tools/buildrom.py
$(Q)./tools/buildrom.py $ $@
 
 ### dsdt build rules
-src/%.hex: src/%.dsl
+src/%.hex: src/%.dsl src/splitdsl.pl src/find_ej0.pl
@echo Compiling DSDT
-   $(Q)cpp -P $  $(OUT)$*.dsl.i
-   $(Q)iasl -tc -p $(OUT)$* $(OUT)$*.dsl.i
+   $(Q)cpp -P $  $(OUT)$*.dsl.i.orig
+   $(Q)./src/splitdsl.pl $(OUT)$*.dsl.i.orig  $(OUT)$*.dsl.i
+   $(Q)iasl -l -tc -p $(OUT)$* $(OUT)$*.dsl.i
$(Q)cp $(OUT)$*.hex $@
 
 $(OUT)ccode32flat.o: src/acpi-dsdt.hex
diff --git a/src/splitdsl.pl b/src/splitdsl.pl
new file mode 100755
index 000..1caccba
--- /dev/null
+++ b/src/splitdsl.pl
@@ -0,0 +1,16 @@
+#!/usr/bin/perl
+
+while ()
+{
+   #Remove // comments if any as we can not split lines after them
+   s#//.*##;
+   #Code after { starts a new line
+   s/(\{)(\s*\S)/$1\n$2/g;
+   #Operator after ) starts a new line
+   s/(\))(\s*[_A-Za-z])/$1\n$2/g;
+   #Code after } starts a new line
+   s/(})(\s*\S)/$1\n$2/g;
+   #} starts a new line
+   s/(\S\s*)(\})/$1\n$2/g;
+   print;
+}
-- 
1.7.5.53.gc233e

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Michael S. Tsirkin
script ./src/find_ej0.pl finds all instances of
method named EJ0_ and the matching _ADR information,
and outputs the AML offset and slot mask of each.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 src/find_ej0.pl |  136 +++
 1 files changed, 136 insertions(+), 0 deletions(-)
 create mode 100755 src/find_ej0.pl

diff --git a/src/find_ej0.pl b/src/find_ej0.pl
new file mode 100755
index 000..37e8a8c
--- /dev/null
+++ b/src/find_ej0.pl
@@ -0,0 +1,136 @@
+#!/usr/bin/perl
+
+# Process mixed ASL/AML listing (.lst file) produced by iasl -l
+# Locate all occurences of Name _ADR followed by Method EJ0_
+# Output slot info from _ADR and offset of method name in AML
+
+use strict;
+
+my @aml = ();
+my @asl = ();
+my @asl_lineno = ();
+my @asl_to_aml_offset = ();
+my @output = ();
+
+#Store an ASL command, matching AML offset, and input line (for debugging)
+sub add_asl {
+my $srcline = shift(@_);
+my $line = shift(@_);
+push @asl, $line;
+push @asl_lineno, $.;
+push @asl_to_aml_offset, $#aml + 1;
+}
+
+#Store an AML byte sequence
+#Verify that offset output by iasl matches # of bytes so far
+sub add_aml {
+my $offset = shift(@_);
+my $line = shift(@_);
+my $o = hex($offset);
+# Sanity check: offset must match
+die Offset $o !=  .($#aml + 1) if ($o != $#aml + 1);
+# Strip any traling dots and ASCII dump after 
+$line =~ s/\s*\.*\s*.*//;
+
+my @code = split(' ', $line);
+foreach my $c (@code) {
+if (not($c =~ m/^[0-9A-Fa-f][0-9A-Fa-f]$/)) {
+die Unexpected octet $c;
+}
+push @aml, hex($c);
+}
+}
+
+# Process aml bytecode array, decoding AML
+# Given method offset, find its name offset
+sub aml_method_name {
+my $lineno = shift(@_);
+my $offset = shift(@_);
+#0x14 MethodOp PkgLength NameString MethodFlags TermList
+if ($aml[$offset] != 0x14) {
+die Method after input line $lineno offset $offset:  .
+ expected 0x14 actual . sprintf(0x%x, $aml[$offset]);
+}
+$offset += 1;
+# PkgLength can be multibyte. Bits 8-7 give the # of extra bytes.
+my $pkglenbytes = $aml[$offset]  6;
+$offset += 1 + $pkglenbytes;
+return $offset;
+}
+
+while () {
+#Strip trailing newline
+chomp;
+   #ASL listing: space, then line#, then , then code
+   if (s#^\s+([0-9]+)\.\.\.\.\s*##) {
+add_asl($1, $_);
+   }
+# AML listing: offset in hex, then , then code
+   if (s#^([0-9A-F]+)\.\.\.\.\s*##) {
+add_aml($1, $_);
+}
+   # Ignore everything else
+}
+
+# Now go over code, look for EJ0_ methods
+# For each such method, output slot mask from the
+# preceding _ADR line, as well as the method name offset.
+for (my $i = 0; $i = $#asl; $i++) {
+my $l = $asl[$i];
+# match: Method (EJ0_,1)
+if (not $l =~ m#^Method\s*\(\s*EJ0_\s*[,)]#) {
+# Make sure we do not miss any EJ0_:
+# die if EJ0_ is found anywhere else in source code
+if ($l =~ m#EJ0_#) {
+die Stray EJ0_ detected at input line $asl_lineno[$i]: $asl[$i];
+}
+next;
+}
+# EJ0_ found. Previous line must be _ADR
+my $p = $i  0 ? $asl[$i - 1] : ;
+# match: Name (_ADR, 0xaddress)
+if (not ($p =~ m#Name\s*\(\s*_ADR\s*,\s*0x([0-9A-Fa-f]+)\s*\)#)) {
+die _ADR not found before EJ0_ .
+at input line $asl_lineno[$i]: $asl[$i];
+}
+my $adr = hex($1);
+my $slot = $adr  16;
+if ($slot  31) {
+die _ADR device out of range: actual $slot  .
+expected 0 to 31 at input line $asl_lineno[$i]: $asl[$i];
+}
+
+# We have offset of EJ0_ method in code
+# Now find EJ0_ itself
+my $offset = $asl_to_aml_offset[$i];
+my $ej0 = aml_method_name($asl_lineno[$i], $offset);
+# Verify AML: name must be EJ0_:
+if (($aml[$ej0 + 0] != ord('E')) or
+($aml[$ej0 + 1] != ord('J')) or
+($aml[$ej0 + 2] != ord('0')) or
+($aml[$ej0 + 3] != ord('_'))) {
+die AML after input line $asl_lineno[$i] offset $ej0  .
+does not match EJ0_;
+}
+
+# OK we are done. Output slot mask and offset
+push @output, sprintf({.slot_mask = 0x%x, .offset = 0x%x},
+  0x1  $slot, $ej0);
+}
+
+# Pretty print output
+if ($#output  0) {
+die No EJ0_ Method found!
+}
+print EOF;
+static struct aml_ej0_data {
+unsigned slot_mask;
+unsigned offset;
+} aml_ej0_data[] = {
+EOF
+print join(,\n, @output);
+print EOF;
+
+};
+EOF
+
-- 
1.7.5.53.gc233e

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] acpi: EJ0 method name patching

2011-09-21 Thread Michael S. Tsirkin
Modify ACPI to only supply _EJ0 methods for PCI
slots that support hotplug.

This is done by runtime patching:
- Rename _EJ0 methods for PCI slots in DSDT to EJ0_:
  note that this has the same checksum, but
  is ignored by OSPM.
- At compile time, look for these methods in ASL source,
  find the matching AML,  and store the offsets of these methods
  in a table named aml_ej0_data.
  Note that we are looking for EJ0_ in source code,
  so we'll be able to write EJ0 if we want to and the script
  will not match it.
- At run time, go over aml_ej0_data, check which slots
  support hotplug and patch the ACPI table, replacing EJ0_ with _EJ0.

Note: the method used is robust in that we don't need
to change any offsets manually in case of ASL code changes.
As all parsing is done at compile time, any unexpected input causes
build failure, not a runtime failure.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile  |3 ++-
 src/acpi-dsdt.dsl |9 +++--
 src/acpi.c|   11 +++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 5c011bb..dee93d6 100644
--- a/Makefile
+++ b/Makefile
@@ -197,7 +197,8 @@ src/%.hex: src/%.dsl src/splitdsl.pl src/find_ej0.pl
$(Q)cpp -P $  $(OUT)$*.dsl.i.orig
$(Q)./src/splitdsl.pl $(OUT)$*.dsl.i.orig  $(OUT)$*.dsl.i
$(Q)iasl -l -tc -p $(OUT)$* $(OUT)$*.dsl.i
-   $(Q)cp $(OUT)$*.hex $@
+   $(Q)./src/find_ej0.pl $(OUT)$*.lst  $(OUT)$*.off
+   $(Q)cat $(OUT)$*.hex $(OUT)$*.off  $@
 
 $(OUT)ccode32flat.o: src/acpi-dsdt.hex
 
diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 08412e2..3d43e4b 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -127,11 +127,16 @@ DefinitionBlock (
 {
 PCRM, 32,
 }
-
+// Method EJ0_ will be patched by BIOS to _EJ0
+// at runtime, if the slot is detected to support hotplug.
+// Must be immediately preceded by _ADR for this to work.
+// EJ0_ is not allowed anywhere else in this file,
+// if you really want to use it, write EJ0 which
+// creates the same AML but isn't patched.
 #define hotplug_slot(name, nr) \
 Device (S##name) {\
Name (_ADR, nr##)  \
-   Method (_EJ0,1) {  \
+   Method (EJ0_,1) {  \
 Store(ShiftLeft(1, nr), B0EJ) \
 Return (0x0)  \
}  \
diff --git a/src/acpi.c b/src/acpi.c
index 6bb6ff6..cbb5143 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -198,6 +198,8 @@ struct srat_memory_affinity
 u32reserved3[2];
 } PACKED;
 
+#define PCI_RMV_BASE 0xae0c
+
 #include acpi-dsdt.hex
 
 static void
@@ -243,6 +245,8 @@ build_fadt(struct pci_device *pci)
 struct fadt_descriptor_rev1 *fadt = malloc_high(sizeof(*fadt));
 struct facs_descriptor_rev1 *facs = memalign_high(64, sizeof(*facs));
 void *dsdt = malloc_high(sizeof(AmlCode));
+u32 rmvc_pcrm;
+int i;
 
 if (!fadt || !facs || !dsdt) {
 warn_noalloc();
@@ -255,7 +259,13 @@ build_fadt(struct pci_device *pci)
 facs-length = cpu_to_le32(sizeof(*facs));
 
 /* DSDT */
 memcpy(dsdt, AmlCode, sizeof(AmlCode));
+rmvc_pcrm = inl(PCI_RMV_BASE);
+for (i = 0; i  sizeof(aml_ej0_data) / sizeof(*aml_ej0_data); ++i) {
+if (rmvc_pcrm  aml_ej0_data[i].slot_mask) {
+memcpy(dsdt + aml_ej0_data[i].offset, _EJ0, 4);
+}
+}
 
 /* FADT */
 memset(fadt, 0, sizeof(*fadt));
-- 
1.7.5.53.gc233e

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] acpi: remove _RMV

2011-09-21 Thread Michael S. Tsirkin
The macro gen_pci_device is used to add _RMV
method to a slot device so it is no longer needed:
presence of _EJ0 now indicates that the slot is ejectable.
It is also placing two devices with the same _ADR
on the same bus, which isn't defined by the ACPI spec.
So let's remove it.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 src/acpi-dsdt.dsl |   49 -
 1 files changed, 0 insertions(+), 49 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 3d43e4b..44a27e4 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -121,12 +121,6 @@ DefinitionBlock (
 {
 B0EJ, 32,
 }
-
-OperationRegion(RMVC, SystemIO, 0xae0c, 0x04)
-Field(RMVC, DWordAcc, NoLock, WriteAsZeros)
-{
-PCRM, 32,
-}
 // Method EJ0_ will be patched by BIOS to _EJ0
 // at runtime, if the slot is detected to support hotplug.
 // Must be immediately preceded by _ADR for this to work.
@@ -464,49 +458,6 @@ DefinitionBlock (
DRSJ, 32
}
}
-
-#define gen_pci_device(name, nr)\
-Device(SL##name) {  \
-Name (_ADR, nr##)   \
-Method (_RMV) { \
-If (And(\_SB.PCI0.PCRM, ShiftLeft(1, nr))) {\
-Return (0x1)\
-}   \
-Return (0x0)\
-}   \
-Name (_SUN, name)   \
-}
-
-/* VGA (slot 1) and ISA bus (slot 2) defined above */
-   gen_pci_device(3, 0x0003)
-   gen_pci_device(4, 0x0004)
-   gen_pci_device(5, 0x0005)
-   gen_pci_device(6, 0x0006)
-   gen_pci_device(7, 0x0007)
-   gen_pci_device(8, 0x0008)
-   gen_pci_device(9, 0x0009)
-   gen_pci_device(10, 0x000a)
-   gen_pci_device(11, 0x000b)
-   gen_pci_device(12, 0x000c)
-   gen_pci_device(13, 0x000d)
-   gen_pci_device(14, 0x000e)
-   gen_pci_device(15, 0x000f)
-   gen_pci_device(16, 0x0010)
-   gen_pci_device(17, 0x0011)
-   gen_pci_device(18, 0x0012)
-   gen_pci_device(19, 0x0013)
-   gen_pci_device(20, 0x0014)
-   gen_pci_device(21, 0x0015)
-   gen_pci_device(22, 0x0016)
-   gen_pci_device(23, 0x0017)
-   gen_pci_device(24, 0x0018)
-   gen_pci_device(25, 0x0019)
-   gen_pci_device(26, 0x001a)
-   gen_pci_device(27, 0x001b)
-   gen_pci_device(28, 0x001c)
-   gen_pci_device(29, 0x001d)
-   gen_pci_device(30, 0x001e)
-   gen_pci_device(31, 0x001f)
 }
 
 /* PCI IRQs */
-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] acpi: generate mixed asl/aml listing

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 03:44:21PM +0300, Michael S. Tsirkin wrote:
 Use iasl -l flag to produce a mixed listing, where a
 source line is followed by matching AML.
 Since we use a preprocessed input, this generates long lines with
 multiple commands per line. To make it possible to match
 AML to source exactly, split the input that we supply iasl,
 with each command on a separate line.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

I thought I'd add an example listing here:
 467 Device (S1) {

080B5B 82 25 53 31 5F 5F ...[.%S1__

 468 Name (_ADR, 0x0001)

081208 5F 41 44 52 0C 00 00 ._ADR...
081A01 00 ....

 469 Method (EJ0_,1) {

081C14 0F 45 4A 30 5F 01 .....EJ0_.

 470 Store(ShiftLeft(1, 0x0001), B0EJ)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-21 Thread Kevin O'Connor
On Wed, Sep 21, 2011 at 02:09:08PM +0300, Michael S. Tsirkin wrote:
 On Wed, Sep 21, 2011 at 01:39:22AM -0400, Amos Kong wrote:
  - Original Message -
   How about moving code into functions so that it isn't duplicated for
   each PCI device.  See the patch below as an example (100% untested).
 
 Hmm, I sent patches that did a similar thing but
 in a slightly more compact way.
 Message ids:
 20110919092932.gb4...@redhat.com
 20110919093644.gc4...@redhat.com
 20110919100434.ga6...@redhat.com
 
 Did they not reach you or something's wrong with them?

I received them, but when I saw Amos' v2 patch I thought he included
them.

   +/* Bulk generated PCI hotplug devices */
   +#define hotplug_func(nr, fn)\
   +Device (S##nr##fn) {\
   +Name (_ADR, 0x##nr##000##fn)\
   +Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
   +Name (_SUN, 0x##nr) \
   +}
 
 The fundamental question is still why would
 we have _EJ0 methods in functions 0 when they are
 not individually hotpluggable.
 I think only function 0 should have _EJ0.

I don't know the answer to this question.

Maybe we should just collapse the current definitions and then put the
fixes and enhancements on top of the collapsed version.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 08:47:39AM -0400, Kevin O'Connor wrote:
 On Wed, Sep 21, 2011 at 02:09:08PM +0300, Michael S. Tsirkin wrote:
  On Wed, Sep 21, 2011 at 01:39:22AM -0400, Amos Kong wrote:
   - Original Message -
How about moving code into functions so that it isn't duplicated for
each PCI device.  See the patch below as an example (100% untested).
  
  Hmm, I sent patches that did a similar thing but
  in a slightly more compact way.
  Message ids:
  20110919092932.gb4...@redhat.com
  20110919093644.gc4...@redhat.com
  20110919100434.ga6...@redhat.com
  
  Did they not reach you or something's wrong with them?
 
 I received them, but when I saw Amos' v2 patch I thought he included
 them.
 
+/* Bulk generated PCI hotplug devices */
+#define hotplug_func(nr, fn)\
+Device (S##nr##fn) {\
+Name (_ADR, 0x##nr##000##fn)\
+Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
+Name (_SUN, 0x##nr) \
+}
  
  The fundamental question is still why would
  we have _EJ0 methods in functions 0 when they are
  not individually hotpluggable.
  I think only function 0 should have _EJ0.
 
 I don't know the answer to this question.
 
 Maybe we should just collapse the current definitions and then put the
 fixes and enhancements on top of the collapsed version.
 
 -Kevin

OK, I'll refactor my patches.
I just sent a patchset which in particular removes
the gen_pci_device macro, probably best to work on top of that?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Dave Hansen
On Tue, 2011-09-20 at 16:55 -0300, Marcelo Tosatti wrote:
  and the wall clock stays behind my host wall clock by the amount of
  time it took to resume.
 
 This is expected, similar to savevm/loadvm. 

That seems like pretty undesirable behavior to me.  It's too bad that it
does that with savevm/loadvm, but is it really behavior that we want to
spread?

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Gleb Natapov
On Wed, Sep 21, 2011 at 03:44:29PM +0300, Michael S. Tsirkin wrote:
 script ./src/find_ej0.pl finds all instances of
 method named EJ0_ and the matching _ADR information,
 and outputs the AML offset and slot mask of each.
 
There is tools/ directory for such kind of scripts. Most (if not all) of
scripts there are in python though. Perl should die painful death.

This approach delivers nice result, but since the script does not really
decodes AML, but tries to match ASL source code with regular expressions,
it introduces some assumptions to the code that make DSDT code less
hackable. I'll hate to be the one who will have to change PCI device
definitions in DSDT next time.

Generally speaking finding an offset of some scope in AML is useful not
only for PCI hotplug. For instance we want to make S3/S4 capability
configurable by a command line switch, but this also requires DSDT
patching and having automatic way to find _S3_/_S4_ offset is required
for that too (we do not what to find it by hand each time DSDT is
recompiled).

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  src/find_ej0.pl |  136 
 +++
  1 files changed, 136 insertions(+), 0 deletions(-)
  create mode 100755 src/find_ej0.pl
 
 diff --git a/src/find_ej0.pl b/src/find_ej0.pl
 new file mode 100755
 index 000..37e8a8c
 --- /dev/null
 +++ b/src/find_ej0.pl
 @@ -0,0 +1,136 @@
 +#!/usr/bin/perl
 +
 +# Process mixed ASL/AML listing (.lst file) produced by iasl -l
 +# Locate all occurences of Name _ADR followed by Method EJ0_
 +# Output slot info from _ADR and offset of method name in AML
 +
 +use strict;
 +
 +my @aml = ();
 +my @asl = ();
 +my @asl_lineno = ();
 +my @asl_to_aml_offset = ();
 +my @output = ();
 +
 +#Store an ASL command, matching AML offset, and input line (for debugging)
 +sub add_asl {
 +my $srcline = shift(@_);
 +my $line = shift(@_);
 +push @asl, $line;
 +push @asl_lineno, $.;
 +push @asl_to_aml_offset, $#aml + 1;
 +}
 +
 +#Store an AML byte sequence
 +#Verify that offset output by iasl matches # of bytes so far
 +sub add_aml {
 +my $offset = shift(@_);
 +my $line = shift(@_);
 +my $o = hex($offset);
 +# Sanity check: offset must match
 +die Offset $o !=  .($#aml + 1) if ($o != $#aml + 1);
 +# Strip any traling dots and ASCII dump after 
 +$line =~ s/\s*\.*\s*.*//;
 +
 +my @code = split(' ', $line);
 +foreach my $c (@code) {
 +if (not($c =~ m/^[0-9A-Fa-f][0-9A-Fa-f]$/)) {
 +die Unexpected octet $c;
 +}
 +push @aml, hex($c);
 +}
 +}
 +
 +# Process aml bytecode array, decoding AML
 +# Given method offset, find its name offset
 +sub aml_method_name {
 +my $lineno = shift(@_);
 +my $offset = shift(@_);
 +#0x14 MethodOp PkgLength NameString MethodFlags TermList
 +if ($aml[$offset] != 0x14) {
 +die Method after input line $lineno offset $offset:  .
 + expected 0x14 actual . sprintf(0x%x, $aml[$offset]);
 +}
 +$offset += 1;
 +# PkgLength can be multibyte. Bits 8-7 give the # of extra bytes.
 +my $pkglenbytes = $aml[$offset]  6;
 +$offset += 1 + $pkglenbytes;
 +return $offset;
 +}
 +
 +while () {
 +#Strip trailing newline
 +chomp;
 + #ASL listing: space, then line#, then , then code
 + if (s#^\s+([0-9]+)\.\.\.\.\s*##) {
 +add_asl($1, $_);
 + }
 +# AML listing: offset in hex, then , then code
 + if (s#^([0-9A-F]+)\.\.\.\.\s*##) {
 +add_aml($1, $_);
 +}
 + # Ignore everything else
 +}
 +
 +# Now go over code, look for EJ0_ methods
 +# For each such method, output slot mask from the
 +# preceding _ADR line, as well as the method name offset.
 +for (my $i = 0; $i = $#asl; $i++) {
 +my $l = $asl[$i];
 +# match: Method (EJ0_,1)
 +if (not $l =~ m#^Method\s*\(\s*EJ0_\s*[,)]#) {
 +# Make sure we do not miss any EJ0_:
 +# die if EJ0_ is found anywhere else in source code
 +if ($l =~ m#EJ0_#) {
 +die Stray EJ0_ detected at input line $asl_lineno[$i]: 
 $asl[$i];
 +}
 +next;
 +}
 +# EJ0_ found. Previous line must be _ADR
 +my $p = $i  0 ? $asl[$i - 1] : ;
 +# match: Name (_ADR, 0xaddress)
 +if (not ($p =~ m#Name\s*\(\s*_ADR\s*,\s*0x([0-9A-Fa-f]+)\s*\)#)) {
 +die _ADR not found before EJ0_ .
 +at input line $asl_lineno[$i]: $asl[$i];
 +}
 +my $adr = hex($1);
 +my $slot = $adr  16;
 +if ($slot  31) {
 +die _ADR device out of range: actual $slot  .
 +expected 0 to 31 at input line $asl_lineno[$i]: $asl[$i];
 +}
 +
 +# We have offset of EJ0_ method in code
 +# Now find EJ0_ itself
 +my $offset = $asl_to_aml_offset[$i];
 +my $ej0 = aml_method_name($asl_lineno[$i], $offset);
 +# Verify AML: name must be EJ0_:
 +if (($aml[$ej0 + 0] != ord('E')) or
 +($aml[$ej0 + 1] != ord('J')) or
 +  

Re: percpu crash on NetBurst

2011-09-21 Thread Avi Kivity

On 08/08/2011 12:55 PM, Tejun Heo wrote:

Hello, Avi.

On Sun, Aug 07, 2011 at 06:32:35PM +0300, Avi Kivity wrote:
  qemu, under some conditions (-cpu host or -cpu kvm64), erroneously
  passes family=15 as the virtual cpuid.  This causes a BUG() in
  percpu code during late boot:

  [ cut here ]
  kernel BUG at mm/percpu.c:577!



snip


  All this applies to v3.0; current upstream (c2f340a69ca) fails even
  worse, haven't yet determined exactly why.

  I'm surprised this hasn't been reported before; Ingo, don't you have
  family=15 hosts in your test farm?

Hmmm... I can't trigger the problem w/ kvm64 (I tried mounting and
unmounting filesystems but it worked okay) and am quite skeptical this
is a wide spread problem given that the percpu core code is used very
widely and hasn't seen a lot of changes lately.  Is there anything
specific you need to do to trigger the condition?  Can you try to
print out the s_files addresses being allocated and freed?



Coming back to this, the trigger if cpuid family=6 and model=13 (model 
12 works).  Looks like the code disables rep_good is some MSR doesn't 
have the expected value.  While we should configure the MSR correctly, 
it looks like the fallback code for !rep_good is broken.  Will look further.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 05:27:32PM +0300, Gleb Natapov wrote:
 On Wed, Sep 21, 2011 at 03:44:29PM +0300, Michael S. Tsirkin wrote:
  script ./src/find_ej0.pl finds all instances of
  method named EJ0_ and the matching _ADR information,
  and outputs the AML offset and slot mask of each.
  
 There is tools/ directory for such kind of scripts. Most (if not all) of
 scripts there are in python though.

OK, rewriting that in python should be easy.
I'll wait a bit for more comments on the design though.

 Perl should die painful death.
 
 This approach delivers nice result, but since the script does not really
 decodes AML, but tries to match ASL source code with regular expressions,
 it introduces some assumptions to the code that make DSDT code less
 hackable. I'll hate to be the one who will have to change PCI device
 definitions in DSDT next time.

There are three requirements now:
1. don't use the name EJ0_ anywhere if you don't want it patches
2. _ADR must be an integer constant
3. put _ADR name immediately before EJ0_ method
I tried to make it easy to obey these rules
by adding comments in source code.

I don't believe a generic mechanism that does not place
any restrictions on language use is possible without adding an
AML interpreter in bios.

 Generally speaking finding an offset of some scope in AML is useful not
 only for PCI hotplug. For instance we want to make S3/S4 capability
 configurable by a command line switch, but this also requires DSDT
 patching and having automatic way to find _S3_/_S4_ offset is required
 for that too (we do not what to find it by hand each time DSDT is
 recompiled).

Right. So that would be an easy extension.
We could also add some directives for the tool
(e.g. in C comments) so that you can e.g. find more names
just by adding such a directive in dsl.

I'll be happy to work on that preferably
after we merge a simple version of the tool first.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: percpu crash on NetBurst

2011-09-21 Thread Avi Kivity

On 09/21/2011 05:28 PM, Avi Kivity wrote:


Coming back to this, the trigger if cpuid family=6 and model=13 
(model 12 works).  Looks like the code disables rep_good is some MSR 
doesn't have the expected value.  While we should configure the MSR 
correctly, it looks like the fallback code for !rep_good is broken.  
Will look further.




Ok, without rep_good, memcpy() sometimes copies backwards.  A stale copy 
of memmove_64.c I had around got picked up instead of memmove_64.S; that 
memmove() used memcpy() in the copy-forward case.


So, nothing to see, sorry about the noise.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH] virtio-console: wait for console ports

2011-09-21 Thread Christian Borntraeger
Amit,

can you have a look at the patch below and give feedback or apply
if appropriate?

---
On s390 I have seen some random Warning: unable to open an initial
console boot failure. Turns out that tty_open fails, because the
hvc_alloc was not yet done. In former times this could not happen,
since the probe function automatically called hvc_alloc. With newer
versions (multiport) some host-guest interaction is required
before hvc_alloc is called. This might be too late, especially if
an initramfs is involved. Lets use a completion if we have
multiport and an early console.



Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

---
 drivers/char/virtio_console.c |   18 ++
 1 file changed, 18 insertions(+)

Index: b/drivers/char/virtio_console.c
===
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -19,6 +19,7 @@
  */
 #include linux/cdev.h
 #include linux/debugfs.h
+#include linux/completion.h
 #include linux/device.h
 #include linux/err.h
 #include linux/fs.h
@@ -73,6 +74,7 @@ struct ports_driver_data {
 static struct ports_driver_data pdrvdata;
 
 DEFINE_SPINLOCK(pdrvdata_lock);
+DECLARE_COMPLETION(port_added);
 
 /* This struct holds information that's relevant only for console ports */
 struct console {
@@ -1352,6 +1354,7 @@ static void handle_control_message(struc
break;
 
init_port_console(port);
+   complete(port_added);
/*
 * Could remove the port here in case init fails - but
 * have to notify the host first.
@@ -1648,6 +1651,10 @@ static int __devinit virtcons_probe(stru
struct ports_device *portdev;
int err;
bool multiport;
+   bool early = early_put_chars != 0;
+
+   /* Ensure to read early_put_chars now */
+   barrier();
 
portdev = kmalloc(sizeof(*portdev), GFP_KERNEL);
if (!portdev) {
@@ -1719,6 +1726,17 @@ static int __devinit virtcons_probe(stru
 
__send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
   VIRTIO_CONSOLE_DEVICE_READY, 1);
+
+   /* If there was an early virtio console, assume that there are no
+* other consoles. We need to wait until the hvc_alloc matches the
+* hvc_instantiate, otherwise tty_open will complain, resulting in
+* a Warning: unable to open an initial console boot failure.
+* Without multiport this is done in add_port above. With multiport
+* this might take some host-guest communication - thus we have to
+* wait. */
+   if (multiport  early)
+   wait_for_completion(port_added);
+
return 0;
 
 free_vqs:
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pci passthrough fail on fedora 15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)

2011-09-21 Thread Erik Flister
i posted this to virt-tools but got no replies, hoping you guys can help me.  :)


- Forwarded Message -
From: Erik Flister e_flis...@yahoo.com
To: erik flister erik.flis...@gmail.com; virt-tools-l...@redhat.com 
virt-tools-l...@redhat.com
Sent: Tuesday, September 20, 2011 9:36 AM
Subject: Re: [virt-tools-list] pci passthrough fail on fedora 
15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)


i found this discussion of the same error:

http://www.linux-kvm.com/ content/pci-passthrough-error
 I've solved my problem by deleting the '.save' file in
 /var/lib/libvirt/qemu/save

but i don't have that file:

 sudo ls -al
/var/lib/libvirt/qemu/save
total 8
drwxr-xr-x 2 qemu qemu 4096 Sep 16 03:22 .
drwxr-x--- 5 qemu qemu 4096 Sep 19 10:13 ..


-erik




From: erik flister erik.flis...@gmail.com
To: virt-tools-l...@redhat.com
Sent: Monday, September 19, 2011 9:19 PM
Subject: [virt-tools-list] pci passthrough fail on fedora 
15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)

AMD
phenom II X6 1075T proc
ASUS M4A87TD mobo
BIOS ver 2001 (built 3/8/11)

advanced/cpu config/secure virtual machine mode enabled in BIOS

not sure if this turns on iommu, how do i verify?

/proc/cpuinfo has hits for svm but not iommu or vmx.

 dmesg | grep -i iommu
[    0.00] Please enable the IOMMU option in the BIOS setup
[    1.515596] PCI-DMA: using GART IOMMU.
[    1.515599] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture

note my pci cards have windows drivers, but not linux drivers as far as i 
know.

i have latest yummed kvm/qemu/virt-manager.

i am using this virtualization method:
http://www.techotopia.com/index.php/Running_Windows_on_Fedora_Using_KVM_Virtualization

i didn't know to disconnect the pci
cards from the
host before adding
them to the guest until finding:
http://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/sect-Virtualization-PCding_a_PCI_device_to_a_host.html

 lspci

02:06.0 Unassigned class [ff00]: National Instruments PCI-6110
02:07.0 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
02:07.1 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
02:07.2 Parallel controller: NetMos Technology PCI 9865 Multi-I/O Controller

 lspci -n

02:06.0 ff00: 1093:14e0
02:07.0 0700: 9710:9865
02:07.1 0700: 9710:9865
02:07.2 0701: 9710:9865

 virsh nodedev-list | grep pci

pci__02_06_0
pci__02_07_0
pci__02_07_1
pci__02_07_2

 sudo virsh
nodedev-dettach
pci__02_06_0
Device pci__02_06_0 dettached


but then, after adding it to the guest and trying to boot it, i get:

Error starting domain: this function is not supported by the
connection driver: Unable to reset PCI device :02:06.0: this
function is not supported by the connection driver: Active
:02:07.0 devices on bus with :02:06.0, not doing bus reset

Traceback (most recent call last):
  File /usr/share/virt-manager/virtManager/asyncjob.py, line 45, in 
cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File /usr/share/virt-manager/virtManager/engine.py, line 959, in 
asyncfunc
    vm.startup()
  File /usr/share/virt-manager/virtManager/domain.py, line 1128, in startup
    self._backend.create()
  File /usr/lib64/python2.7/site-packages/libvirt.py, line 330, in create
    if ret == -1: raise libvirtError
('virDomainCreate() failed', dom=self)



if i disable the other devices on bus 02 and add them to the guest:
 sudo virsh nodedev-dettach pci__02_07_0
Device pci__02_07_0 dettached

 sudo virsh nodedev-dettach pci__02_07_1
Device pci__02_07_1 dettached

 sudo virsh nodedev-dettach pci__02_07_2
Device pci__02_07_2 dettached

i then get this error on booting the guest (but with the same stack
trace as above):

Error starting domain: cannot send monitor command
'{execute:qmp_capabilities}': Connection reset by peer

and then networking fails on the host until i reboot, and fedora won't
even shutdown all the way (i have to do a hard reboot by holding down
the power button).

any idea how i can fix this?  why isn't detaching the device
automatically done when it is added to the guest?  even if i get this
working, it seems i will
have to detach stuff every time i reboot the
host?  finally, in this case, i actually do want to give everything on
bus 02 to the guest, but what if i didn't?

thanks for your help!
-erik


see also iommu/bios stuff here:
http://docs.fedoraproject.org/en-US/Fedora/13/html-single/Virtualization_Guide/index.html

___
virt-tools-list mailing list
virt-tools-l...@redhat.com
https://www.redhat.com/mailman/listinfo/virt-tools-list

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-21 Thread Marcelo Tosatti
On Wed, Sep 21, 2011 at 11:46:03AM +0300, Avi Kivity wrote:
 On 09/20/2011 08:28 PM, Avi Kivity wrote:
 On 09/20/2011 07:30 PM, Marcelo Tosatti wrote:
  
 We do have a small issue.  If we exit during
 NMI-blocked-by-STI and
 nmi_pending == 2, then we lose the second interrupt.
 Should rarely
 happen, since external interrupts never exit in that
 condition, but
 it's a wart.

Actually exits in the window between 

- increase of nmi_queued 
and 
- transfer to nmi_pending/nmi_injected

Lose all nmi_queued values, no?

  
  And the above system reset case, you should be able to handle it by
  saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).
 
   We could just add a KVM_CAP (and flag) that extends nmi_pending from
   a bool to a counter.
 
 Or just add a new field to the pad.
 
 
 Okay; I'll address this in a follow-on patch (my preference is
 making nmi_pending a counter).
 
 
 Yet another way to do this is to redefine .injected (just in the
 API) to mean: inject immediately, unless blocked by interrupt
 shadow; in this case inject in the next instruction.  No KVM_CAP or
 anything.
 
 The drawback is that if we hit the corner case of two NMIs queued
 and held back by interrupt shadow, an older kvm live migration
 target will not run the guest (exit with invalid state).  The
 advantage is that no user space or API changes are necessary.
 
 Given that to get into this corner you need an NMI intensive load
 AND a sti; blah pair that spans two pages AND have the second page
 unavailable when those NMIs hit, I think it's better to avoid the
 API change.  Opinions?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Marcelo Tosatti
On Wed, Sep 21, 2011 at 07:11:08AM -0700, Dave Hansen wrote:
 On Tue, 2011-09-20 at 16:55 -0300, Marcelo Tosatti wrote:
   and the wall clock stays behind my host wall clock by the amount of
   time it took to resume.
  
  This is expected, similar to savevm/loadvm. 
 
 That seems like pretty undesirable behavior to me.  It's too bad that it
 does that with savevm/loadvm, but is it really behavior that we want to
 spread?

Not really. savevm/loadvm has more widespread use, so a solution could
apply equally to both cases.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 09:03:51PM +0100, Jamie Lokier wrote:
 Marcelo Tosatti wrote:
  In case the VM stops for whatever reason, the host system is not
  supposed to adjust time related hardware state to compensate, in an
  attempt to present apparent continuous time.
  
  If you save a VM and then restore it later, it is the guest
  responsability to adjust its time representation.
 
 If the guest doesn't know it's been stopped, then its time
 representation will be wrong until it finds out, e.g. after a few
 minutes with NTP, or even a seconds can be too long.
 
 That is sad when it happens because it breaks the coherence of any
 timed-lease caching the guest is involved in.  I.e. where the guest
 acquires a lock on some data object (like a file in NFS) that it can
 efficiently access without network round trips (similar to MESI), with
 all nodes having agreed that it's coherent for, say, 5 seconds before
 renewing or breaking.  (It's just a way to reduce latency.)
 
 But we can't trust CLOCK_MONOTONIC when a VM is involved, it's just
 one of those facts of life.  So instead the effort is to try and
 detect when a VM is involved and then distrust the clock.
 
 (Non-VM) suspend/resume is similar, but there's usually a way to
 be notified about that as it happens.
 
 -- Jamie

Thats pretty bad.

Time leased caching cannot be enabled in conjunction with save/restore
without VM involvement (the hypervisor management application must
notify the guest so it can drop such cache/writeback pending data).

But, if paravirtual clock is available (Linux VMs), it should be
possible to add an offset similarly to how suspend/resume is performed,
via a high priority channel, before resuming operation. We should look
into that possibility.

Thanks

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Michael S. Tsirkin
I've rewritten the script in python. Seems to work but
I didn't have time to test - only compiled for now -
and needs to move to tools - but I hope this makes
review easier.

Thanks,

---
 src/find_ej0.py |  140 +++
 1 files changed, 140 insertions(+), 0 deletions(-)
 create mode 100755 src/find_ej0.py

diff --git a/src/find_ej0.py b/src/find_ej0.py
new file mode 100755
index 000..75e5491
--- /dev/null
+++ b/src/find_ej0.py
@@ -0,0 +1,140 @@
+#!/usr/bin/python
+
+# Process mixed ASL/AML listing (.lst file) produced by iasl -l
+# Locate all occurences of Name _ADR followed by Method EJ0_
+# Output slot info from _ADR and offset of method name in AML
+
+import re;
+import sys;
+import fileinput;
+
+aml = []
+asl = []
+output = []
+lineno = 0
+debug = 
+
+class asl_line:
+line = None
+lineno = None
+aml_offset = None
+
+def die(diag):
+sys.stderr.write(Error: %s; %s\n % (diag, debug))
+sys.exit(1)
+
+#Store an ASL command, matching AML offset, and input line (for debugging)
+def add_asl(line):
+l = asl_line()
+l.line = line
+l.lineno = lineno
+l.aml_offset = len(aml)
+asl.append(l)
+
+#Store an AML byte sequence
+#Verify that offset output by iasl matches # of bytes so far
+def add_aml(offset, line):
+o = int(offset, 16);
+# Sanity check: offset must match size of code so far
+if (o != len(aml)):
+die(Offset 0x%x != 0x%x % (o, len(aml)))
+# Strip any trailing dots and ASCII dump after 
+line = re.sub(r'\s*\.*\s*.*$',, line)
+# Strip traling whitespace
+line = re.sub(r'\s+$',, line)
+# Strip leading whitespace
+line = re.sub(r'^\s+',, line)
+# Split on whitespace
+code = re.split(r'\s+', line)
+for c in code:
+# Require a legal hex number, two digits
+if (not(re.search(r'^[0-9A-Fa-f][0-9A-Fa-f]$', c))):
+die(Unexpected octet %s % c);
+aml.append(int(c, 16));
+
+# Process aml bytecode array, decoding AML
+# Given method offset, find its name offset
+def aml_method_name(lineno, offset):
+#0x14 MethodOp PkgLength NameString MethodFlags TermList
+if (aml[offset] != 0x14):
+die( Method after input line $lineno offset $offset: 
+ expected 0x14 actual 0x%x % aml[offset]);
+offset += 1;
+# PkgLength can be multibyte. Bits 8-7 give the # of extra bytes.
+pkglenbytes = aml[offset]  6;
+offset += 1 + pkglenbytes;
+return offset;
+
+for line in fileinput.input():
+# Strip trailing newline
+line.rstrip();
+# line number and debug string to output in case of errors
+lineno = lineno + 1
+debug = input line %d: %s % (lineno, line)
+#ASL listing: space, then line#, then , then code
+pasl = re.compile('^\s+([0-9]+)\.\.\.\.\s*')
+m = pasl.search(line)
+if (m):
+add_asl(pasl.sub(, line));
+# AML listing: offset in hex, then , then code
+paml = re.compile('^([0-9A-Fa-f]+)\.\.\.\.\s*')
+m = paml.search(line)
+if (m):
+add_aml(m.group(1), paml.sub(, line))
+
+# Now go over code, look for EJ0_ methods
+# For each such method, output slot mask from the
+# preceding _ADR line, as well as the method name offset.
+
+for i in range(len(asl)):
+l = asl[i].line
+debug = input line %d: %s % (asl[i].lineno, asl[i].line)
+# match: Method (EJ0_,1)
+if (not(re.search(r'^Method\s*\(\s*EJ0_\s*[,)]', l))):
+# Make sure we do not miss any EJ0_:
+# die if EJ0_ is found anywhere else in source code
+if (re.search(r'EJ0_', l)):
+die(Stray EJ0_ detected);
+continue
+# EJ0_ found. Previous line must be _ADR
+p = asl[i - 1].line
+# match: Name (_ADR, 0xaddress)
+padr = re.compile('Name\s*\(\s*_ADR\s*,\s*0x([0-9A-Fa-f]+)\s*\)')
+m = padr.search(p);
+if (not m):
+die(_ADR not found before EJ0_ )
+
+adr = int(m.group(1), 16)
+slot = adr  16;
+if (slot  31):
+die(_ADR device out of range: actual %d expected 0 to 31 % slot)
+
+# We have offset of EJ0_ method in code
+# Now find EJ0_ itself
+ej0 = aml_method_name(asl[i].lineno, asl[i].aml_offset)
+# Verify AML: name must be EJ0_:
+if ((aml[ej0 + 0] != ord('E')) or
+(aml[ej0 + 1] != ord('J')) or
+(aml[ej0 + 2] != ord('0')) or
+(aml[ej0 + 3] != ord('_'))):
+die(AML offset 0x%x does not match EJ0_ % ej0)
+
+# OK we are done. Output slot mask and offset
+output.append({.slot_mask = 0x%x, .offset = 0x%x} %
+  (0x1  slot, ej0))
+
+debug = at end of file
+
+# Pretty print output
+if (not len(output)):
+die(No EJ0_ Method found!)
+
+print '''
+static struct aml_ej0_data {
+unsigned slot_mask;
+unsigned offset;
+} aml_ej0_data[] = {'''
+
+print ,\n.join(output)
+print '};\n'
+
-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

Help required in identifying a small task to begin contributing

2011-09-21 Thread Shashank Rachamalla
Hi

I am interested in kernel development and more inclined towards kvm.
However, I need some initial direction which can help me contribute.
Please let me know if I can take up some small task to begin with and
gain some acquaintance. Also, I have gone through the to do list @
http://www.linux-kvm.org/page/TODO but not sure of the current status of
listed items. Please throw some light. 

Regards,
Shashank

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci passthrough fail on fedora 15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)

2011-09-21 Thread Alex Williamson
On Wed, 2011-09-21 at 10:23 -0700, Erik Flister wrote:
 i posted this to virt-tools but got no replies, hoping you guys can help me.  
 :)
 

 AMD
 phenom II X6 1075T proc
 ASUS M4A87TD mobo

Google says this is an AMD 870 chipset, so you do not have AMD IOMMU
(AMD-Vi) support :(

  lspci
 
 02:06.0 Unassigned class [ff00]: National Instruments PCI-6110
 02:07.0 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
 02:07.1 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
 02:07.2 Parallel controller: NetMos Technology PCI 9865 Multi-I/O 
 Controller
 
  lspci -n
 
 02:06.0 ff00: 1093:14e0
 02:07.0 0700: 9710:9865
 02:07.1 0700: 9710:9865
 02:07.2 0701: 9710:9865
 
  virsh nodedev-list | grep pci
 
 pci__02_06_0
 pci__02_07_0
 pci__02_07_1
 pci__02_07_2
 
  sudo virsh
 nodedev-dettach
 pci__02_06_0
 Device pci__02_06_0 dettached
 
 
 but then, after adding it to the guest and trying to boot it, i get:
 
 Error starting domain: this function is not supported by the
 connection driver: Unable to reset PCI device :02:06.0: this
 function is not supported by the connection driver: Active
 :02:07.0 devices on bus with :02:06.0, not doing bus reset

This is probably telling you that the device you want to assign doesn't
support a reset mechanism (FLR or D3hot), so it wants to do a secondary
bus reset, but there are other devices on the same bus.

 if i disable the other devices on bus 02 and add them to the guest:
  sudo virsh nodedev-dettach pci__02_07_0
 Device pci__02_07_0 dettached
 
  sudo virsh nodedev-dettach pci__02_07_1
 Device pci__02_07_1 dettached
 
  sudo virsh nodedev-dettach pci__02_07_2
 Device pci__02_07_2 dettached
 
 i then get this error on booting the guest (but with the same stack
 trace as above):
 
 Error starting domain: cannot send monitor command
 '{execute:qmp_capabilities}': Connection reset by peer
 
 and then networking fails on the host until i reboot, and fedora won't
 even shutdown all the way (i have to do a hard reboot by holding down
 the power button).
 
 any idea how i can fix this?  why isn't detaching the device
 automatically done when it is added to the guest?  even if i get this
 working, it seems i will
 have to detach stuff every time i reboot the
 host?  finally, in this case, i actually do want to give everything on
 bus 02 to the guest, but what if i didn't?

Getting it working seems like it's going to require new hardware since
you don't have a device assignment capable IOMMU.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Connection does not support host device enumeration

2011-09-21 Thread Kaushal Shriyan
Hi,

I have  20:04.0 Network controller: Sangoma Technologies Corp. A104d
QUAD T1/E1 AFT caHi,

I have  20:04.0 Network controller: Sangoma Technologies Corp. A104d
QUAD T1/E1 AFT card on Host OS, Its not visible on guest OS using
linux KVM application.

I did open the window for guest from virt-manager on my Ubuntu Linux
Desktop 11.04, shut down the guest, then select the Details view
from the menu on that window, and click Add Hardware at the bottom,
select PCI Host Device in the selections on the left, and find your
device in the list of host devices on the right. Then click Finish.
Finally, start your guest up again, and the device should appear.

I get Connection  does not support host device enumeration  Any clue ?

libvirt-0.8.2-22.el5 version running on CentOS Linux Server version 5.6

Please let me know if you need any additional information

Regards,

Kaushalrd on Host OS, Its not visible on guest OS using
linux KVM application.

I did open the window for guest from virt-manager on my Ubuntu Linux
Desktop 11.04, shut down the guest, then select the Details view
from the menu on that window, and click Add Hardware at the bottom,
select PCI Host Device in the selections on the left, and find your
device in the list of host devices on the right. Then click Finish.
Finally, start your guest up again, and the device should appear.

I get Connection  does not support host device enumeration  Any clue ?

libvirt-0.8.2-22.el5 version running on CentOS Linux Server version 5.6

Please let me know if you need any additional information

Regards,

Kaushal
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] pci-assign: Fix MSI-X support

2011-09-21 Thread Alex Williamson
Assigned device MSI-X support hasn't been working, this fixes
it.  I believe this should also fix:

https://bugs.launchpad.net/qemu/+bug/830558

Thanks,

Alex

---

Alex Williamson (2):
  pci-assign: Fix MSI-X registration
  pci-assign: Re-order initfn for memory API


 hw/device-assignment.c |   29 +
 1 files changed, 17 insertions(+), 12 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] pci-assign: Re-order initfn for memory API

2011-09-21 Thread Alex Williamson
We now need to scan PCI capabilities and setup an MSI-X page
before we walk the device resources since the overlay is now
setup during init instead of at the first mapping by the guest.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 288f80c..93913b3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1603,6 +1603,14 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 goto out;
 }
 
+if (assigned_device_pci_cap_init(pci_dev)  0)
+goto out;
+
+/* intercept MSI-X entry page in the MMIO */
+if (dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX)
+if (assigned_dev_register_msix_mmio(dev))
+goto out;
+
 /* handle real device's MMIO/PIO BARs */
 if (assigned_dev_register_regions(dev-real_device.regions,
   dev-real_device.region_number,
@@ -1618,9 +1626,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 dev-h_busnr = dev-host.bus;
 dev-h_devfn = PCI_DEVFN(dev-host.dev, dev-host.func);
 
-if (assigned_device_pci_cap_init(pci_dev)  0)
-goto out;
-
 /* assign device to guest */
 r = assign_device(dev);
 if (r  0)
@@ -1631,11 +1636,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 if (r  0)
 goto assigned_out;
 
-/* intercept MSI-X entry page in the MMIO */
-if (dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX)
-if (assigned_dev_register_msix_mmio(dev))
-goto assigned_out;
-
 assigned_dev_load_option_rom(dev);
 QLIST_INSERT_HEAD(devs, dev, next);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] pci-assign: Fix MSI-X registration

2011-09-21 Thread Alex Williamson
Commit c4525754 added a capability check for KVM_CAP_DEVICE_MSIX,
which is unfortunately not exposed, resulting in MSIX never
being listed as a capability.  This breaks anything depending on
MSIX, such as igbvf.  Since we can't specifically check for MSIX
support and KVM_CAP_ASSIGN_DEV_IRQ indicates more than just MSI,
let's just revert c4525754 and replace it with a sanity check that
we need KVM_CAP_ASSIGN_DEV_IRQ if the device supports any kind of
interrupt (which is still mostly paranoia).

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 93913b3..b5bde68 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1189,8 +1189,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 
 /* Expose MSI capability
  * MSI capability is the 1st capability in capability config */
-pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0);
-if (pos != 0  kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ)) {
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0))) {
 dev-cap.available |= ASSIGNED_DEVICE_CAP_MSI;
 /* Only 32-bit/no-mask currently supported */
 if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10))  0) {
@@ -1211,8 +1210,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 pci_set_word(pci_dev-wmask + pos + PCI_MSI_DATA_32, 0x);
 }
 /* Expose MSI-X capability */
-pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-if (pos != 0  kvm_check_extension(kvm_state, KVM_CAP_DEVICE_MSIX)) {
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0))) {
 int bar_nr;
 uint32_t msix_table_entry;
 
@@ -1606,6 +1604,13 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 if (assigned_device_pci_cap_init(pci_dev)  0)
 goto out;
 
+if (!kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ) 
+(dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX ||
+ dev-cap.available  ASSIGNED_DEVICE_CAP_MSI ||
+ assigned_dev_pci_read_byte(pci_dev, PCI_INTERRUPT_PIN) != 0)) {
+goto out;
+}
+
 /* intercept MSI-X entry page in the MMIO */
 if (dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX)
 if (assigned_dev_register_msix_mmio(dev))

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-21 Thread Kevin O'Connor
On Wed, Sep 21, 2011 at 03:44:13PM +0300, Michael S. Tsirkin wrote:
 The reason is that our acpi tables declare both _RMV with value 0,
 and _EJ0 method for these slots. What happens in this case
 is undocumented by ACPI spec, so linux ignores _RMV,
 and windows seems to ignore _EJ0.

Could the DSDT just not define _EJ0 for device 1  2 instead of
dynamically patching them?  (Would there ever be a case where we
wouldn't know at compile time which devices need _EJ0?)

 The correct way to suppress hotplug is not to have _EJ0,
 so this is what this patch does: it probes PIIX and
 modifies DSDT to match.

The code to generate basic SSDT code isn't that difficult (see
build_ssdt and src/ssdt-proc.dsl).  Is there a compelling reason to
patch the DSDT versus just generating the necessary blocks in an SSDT?

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Dave Hansen
On Tue, 2011-09-20 at 16:55 -0300, Marcelo Tosatti wrote:
  and the wall clock stays behind my host wall clock by the amount of
  time it took to resume.
 
 This is expected, similar to savevm/loadvm. 

That seems like pretty undesirable behavior to me.  It's too bad that it
does that with savevm/loadvm, but is it really behavior that we want to
spread?

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Marcelo Tosatti
On Wed, Sep 21, 2011 at 07:11:08AM -0700, Dave Hansen wrote:
 On Tue, 2011-09-20 at 16:55 -0300, Marcelo Tosatti wrote:
   and the wall clock stays behind my host wall clock by the amount of
   time it took to resume.
  
  This is expected, similar to savevm/loadvm. 
 
 That seems like pretty undesirable behavior to me.  It's too bad that it
 does that with savevm/loadvm, but is it really behavior that we want to
 spread?

Not really. savevm/loadvm has more widespread use, so a solution could
apply equally to both cases.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html