RE: [PATCH] KVM: PPC: E500: Support hugetlbfs

2011-09-21 Thread Liu Yu-B13201
 

> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org 
> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf
> Sent: Tuesday, September 20, 2011 7:36 AM
> To: kvm-...@vger.kernel.org
> Cc: kvm@vger.kernel.org
> Subject: [PATCH] KVM: PPC: E500: Support hugetlbfs
> 
> With hugetlbfs support emerging on e500, we should also support KVM
> backing its guest memory by it.
> 
> This patch adds support for hugetlbfs into the e500 shadow mmu code.
> 
> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/kvm/e500_tlb.c |   22 ++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
> index ec17148..64f75eb 100644
> --- a/arch/powerpc/kvm/e500_tlb.c
> +++ b/arch/powerpc/kvm/e500_tlb.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -673,13 +674,34 @@ static inline void 
> kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>   pfn &= ~(tsize_pages - 1);
>   break;
>   }
> + } else if (vma && hva >= vma->vm_start &&
> +   (vma->vm_flags & VM_HUGETLB)) {

Why check (vma && hva >= vma->vm_start) twice?

Thanks,
Yu

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-21 Thread Michael S. Tsirkin
On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote:
> On Wed, Sep 21, 2011 at 03:44:13PM +0300, Michael S. Tsirkin wrote:
> > The reason is that our acpi tables declare both _RMV with value 0,
> > and _EJ0 method for these slots. What happens in this case
> > is undocumented by ACPI spec, so linux ignores _RMV,
> > and windows seems to ignore _EJ0.
> 
> Could the DSDT just not define _EJ0 for device 1 & 2 instead of
> dynamically patching them?  (Would there ever be a case where we
> wouldn't know at compile time which devices need _EJ0?)

Yes. in qemu we can make any slot non hotpluggable on command
line by requesting a non hotpluggable device be put there.

> > The correct way to suppress hotplug is not to have _EJ0,
> > so this is what this patch does: it probes PIIX and
> > modifies DSDT to match.
> 
> The code to generate basic SSDT code isn't that difficult (see
> build_ssdt and src/ssdt-proc.dsl).  Is there a compelling reason to
> patch the DSDT versus just generating the necessary blocks in an SSDT?
> 
> -Kevin

I don't really care whether the code is in DSDT or SSDT,
IMO there isn't much difference between build_ssdt and patching:
main reason is build_ssdt uses offsets hardcoded to a specific binary
(ssdt_proc and SD_OFFSET_* ) while I used
a script to extract offsets.

I think we should avoid relying on copy-pasted binary 
because I see the related ASL code changing in the near future
(with multifunction and bridge support among others).

I can generalize the approach though, so that
it can work for finding arbitrary names
without writing more scripts, hopefully with the
potential to address the hard-coded offsets in acpi.c
as well. Does that sound interesting?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-21 Thread Kevin O'Connor
On Wed, Sep 21, 2011 at 03:44:13PM +0300, Michael S. Tsirkin wrote:
> The reason is that our acpi tables declare both _RMV with value 0,
> and _EJ0 method for these slots. What happens in this case
> is undocumented by ACPI spec, so linux ignores _RMV,
> and windows seems to ignore _EJ0.

Could the DSDT just not define _EJ0 for device 1 & 2 instead of
dynamically patching them?  (Would there ever be a case where we
wouldn't know at compile time which devices need _EJ0?)

> The correct way to suppress hotplug is not to have _EJ0,
> so this is what this patch does: it probes PIIX and
> modifies DSDT to match.

The code to generate basic SSDT code isn't that difficult (see
build_ssdt and src/ssdt-proc.dsl).  Is there a compelling reason to
patch the DSDT versus just generating the necessary blocks in an SSDT?

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] pci-assign: Fix MSI-X registration

2011-09-21 Thread Alex Williamson
Commit c4525754 added a capability check for KVM_CAP_DEVICE_MSIX,
which is unfortunately not exposed, resulting in MSIX never
being listed as a capability.  This breaks anything depending on
MSIX, such as igbvf.  Since we can't specifically check for MSIX
support and KVM_CAP_ASSIGN_DEV_IRQ indicates more than just MSI,
let's just revert c4525754 and replace it with a sanity check that
we need KVM_CAP_ASSIGN_DEV_IRQ if the device supports any kind of
interrupt (which is still mostly paranoia).

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 93913b3..b5bde68 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1189,8 +1189,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 
 /* Expose MSI capability
  * MSI capability is the 1st capability in capability config */
-pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0);
-if (pos != 0 && kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ)) {
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0))) {
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
 /* Only 32-bit/no-mask currently supported */
 if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10)) < 0) {
@@ -1211,8 +1210,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0x);
 }
 /* Expose MSI-X capability */
-pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-if (pos != 0 && kvm_check_extension(kvm_state, KVM_CAP_DEVICE_MSIX)) {
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0))) {
 int bar_nr;
 uint32_t msix_table_entry;
 
@@ -1606,6 +1604,13 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 if (assigned_device_pci_cap_init(pci_dev) < 0)
 goto out;
 
+if (!kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ) &&
+(dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX ||
+ dev->cap.available & ASSIGNED_DEVICE_CAP_MSI ||
+ assigned_dev_pci_read_byte(pci_dev, PCI_INTERRUPT_PIN) != 0)) {
+goto out;
+}
+
 /* intercept MSI-X entry page in the MMIO */
 if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX)
 if (assigned_dev_register_msix_mmio(dev))

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] pci-assign: Re-order initfn for memory API

2011-09-21 Thread Alex Williamson
We now need to scan PCI capabilities and setup an MSI-X page
before we walk the device resources since the overlay is now
setup during init instead of at the first mapping by the guest.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 288f80c..93913b3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1603,6 +1603,14 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 goto out;
 }
 
+if (assigned_device_pci_cap_init(pci_dev) < 0)
+goto out;
+
+/* intercept MSI-X entry page in the MMIO */
+if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX)
+if (assigned_dev_register_msix_mmio(dev))
+goto out;
+
 /* handle real device's MMIO/PIO BARs */
 if (assigned_dev_register_regions(dev->real_device.regions,
   dev->real_device.region_number,
@@ -1618,9 +1626,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 dev->h_busnr = dev->host.bus;
 dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
 
-if (assigned_device_pci_cap_init(pci_dev) < 0)
-goto out;
-
 /* assign device to guest */
 r = assign_device(dev);
 if (r < 0)
@@ -1631,11 +1636,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 if (r < 0)
 goto assigned_out;
 
-/* intercept MSI-X entry page in the MMIO */
-if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX)
-if (assigned_dev_register_msix_mmio(dev))
-goto assigned_out;
-
 assigned_dev_load_option_rom(dev);
 QLIST_INSERT_HEAD(&devs, dev, next);
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] pci-assign: Fix MSI-X support

2011-09-21 Thread Alex Williamson
Assigned device MSI-X support hasn't been working, this fixes
it.  I believe this should also fix:

https://bugs.launchpad.net/qemu/+bug/830558

Thanks,

Alex

---

Alex Williamson (2):
  pci-assign: Fix MSI-X registration
  pci-assign: Re-order initfn for memory API


 hw/device-assignment.c |   29 +
 1 files changed, 17 insertions(+), 12 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Connection does not support host device enumeration

2011-09-21 Thread Kaushal Shriyan
Hi,

I have  "20:04.0 Network controller: Sangoma Technologies Corp. A104d
QUAD T1/E1 AFT caHi,

I have  "20:04.0 Network controller: Sangoma Technologies Corp. A104d
QUAD T1/E1 AFT card" on Host OS, Its not visible on guest OS using
linux KVM application.

I did open the window for guest from virt-manager on my Ubuntu Linux
Desktop 11.04, shut down the guest, then select the "Details" view
from the menu on that window, and click "Add Hardware" at the bottom,
select "PCI Host Device" in the selections on the left, and find your
device in the list of host devices on the right. Then click "Finish".
Finally, start your guest up again, and the device should appear.

I get "Connection  does not support host device enumeration"  Any clue ?

libvirt-0.8.2-22.el5 version running on CentOS Linux Server version 5.6

Please let me know if you need any additional information

Regards,

Kaushalrd" on Host OS, Its not visible on guest OS using
linux KVM application.

I did open the window for guest from virt-manager on my Ubuntu Linux
Desktop 11.04, shut down the guest, then select the "Details" view
from the menu on that window, and click "Add Hardware" at the bottom,
select "PCI Host Device" in the selections on the left, and find your
device in the list of host devices on the right. Then click "Finish".
Finally, start your guest up again, and the device should appear.

I get "Connection  does not support host device enumeration"  Any clue ?

libvirt-0.8.2-22.el5 version running on CentOS Linux Server version 5.6

Please let me know if you need any additional information

Regards,

Kaushal
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci passthrough fail on fedora 15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)

2011-09-21 Thread Alex Williamson
On Wed, 2011-09-21 at 10:23 -0700, Erik Flister wrote:
> i posted this to virt-tools but got no replies, hoping you guys can help me.  
> :)
> 

> >>>AMD
> phenom II X6 1075T proc
> >>>ASUS M4A87TD mobo

Google says this is an AMD 870 chipset, so you do not have AMD IOMMU
(AMD-Vi) support :(

>  lspci
> >>>
> >>>02:06.0 Unassigned class [ff00]: National Instruments PCI-6110
> >>>02:07.0 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
> >>>02:07.1 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
> >>>02:07.2 Parallel controller: NetMos Technology PCI 9865 Multi-I/O 
> >>>Controller
> >>>
>  lspci -n
> >>>
> >>>02:06.0 ff00: 1093:14e0
> >>>02:07.0 0700: 9710:9865
> >>>02:07.1 0700: 9710:9865
> >>>02:07.2 0701: 9710:9865
> >>>
>  virsh nodedev-list | grep pci
> >>>
> >>>pci__02_06_0
> >>>pci__02_07_0
> >>>pci__02_07_1
> >>>pci__02_07_2
> >>>
>  sudo virsh
> nodedev-dettach
> pci__02_06_0
> >>>Device pci__02_06_0 dettached
> >>>
> >>>
> >>>but then, after adding it to the guest and trying to boot it, i get:
> >>>
> >>>Error starting domain: this function is not supported by the
> >>>connection driver: Unable to reset PCI device :02:06.0: this
> >>>function is not supported by the connection driver: Active
> >>>:02:07.0 devices on bus with :02:06.0, not doing bus reset

This is probably telling you that the device you want to assign doesn't
support a reset mechanism (FLR or D3hot), so it wants to do a secondary
bus reset, but there are other devices on the same bus.

> >>>if i disable the other devices on bus 02 and add them to the guest:
>  sudo virsh nodedev-dettach pci__02_07_0
> >>>Device pci__02_07_0 dettached
> >>>
>  sudo virsh nodedev-dettach pci__02_07_1
> >>>Device pci__02_07_1 dettached
> >>>
>  sudo virsh nodedev-dettach pci__02_07_2
> >>>Device pci__02_07_2 dettached
> >>>
> >>>i then get this error on booting the guest (but with the same stack
> >>>trace as above):
> >>>
> >>>Error starting domain: cannot send monitor command
> >>>'{"execute":"qmp_capabilities"}': Connection reset by peer
> >>>
> >>>and then networking fails on the host until i reboot, and fedora won't
> >>>even shutdown all the way (i have to do a hard reboot by holding down
> >>>the power button).
> >>>
> >>>any idea how i can fix this?  why isn't detaching the device
> >>>automatically done when it is added to the guest?  even if i get this
> >>>working, it seems i will
> have to detach stuff every time i reboot the
> >>>host?  finally, in this case, i actually do want to give everything on
> >>>bus 02 to the guest, but what if i didn't?

Getting it working seems like it's going to require new hardware since
you don't have a device assignment capable IOMMU.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help required in identifying a small task to begin contributing

2011-09-21 Thread Shashank Rachamalla
Hi

I am interested in kernel development and more inclined towards kvm.
However, I need some initial direction which can help me contribute.
Please let me know if I can take up some small task to begin with and
gain some acquaintance. Also, I have gone through the to do list @
http://www.linux-kvm.org/page/TODO but not sure of the current status of
listed items. Please throw some light. 

Regards,
Shashank

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Michael S. Tsirkin
I've rewritten the script in python. Seems to work but
I didn't have time to test - only compiled for now -
and needs to move to tools - but I hope this makes
review easier.

Thanks,

---
 src/find_ej0.py |  140 +++
 1 files changed, 140 insertions(+), 0 deletions(-)
 create mode 100755 src/find_ej0.py

diff --git a/src/find_ej0.py b/src/find_ej0.py
new file mode 100755
index 000..75e5491
--- /dev/null
+++ b/src/find_ej0.py
@@ -0,0 +1,140 @@
+#!/usr/bin/python
+
+# Process mixed ASL/AML listing (.lst file) produced by iasl -l
+# Locate all occurences of Name _ADR followed by Method EJ0_
+# Output slot info from _ADR and offset of method name in AML
+
+import re;
+import sys;
+import fileinput;
+
+aml = []
+asl = []
+output = []
+lineno = 0
+debug = ""
+
+class asl_line:
+line = None
+lineno = None
+aml_offset = None
+
+def die(diag):
+sys.stderr.write("Error: %s; %s\n" % (diag, debug))
+sys.exit(1)
+
+#Store an ASL command, matching AML offset, and input line (for debugging)
+def add_asl(line):
+l = asl_line()
+l.line = line
+l.lineno = lineno
+l.aml_offset = len(aml)
+asl.append(l)
+
+#Store an AML byte sequence
+#Verify that offset output by iasl matches # of bytes so far
+def add_aml(offset, line):
+o = int(offset, 16);
+# Sanity check: offset must match size of code so far
+if (o != len(aml)):
+die("Offset 0x%x != 0x%x" % (o, len(aml)))
+# Strip any trailing dots and ASCII dump after "
+line = re.sub(r'\s*\.*\s*".*$',"", line)
+# Strip traling whitespace
+line = re.sub(r'\s+$',"", line)
+# Strip leading whitespace
+line = re.sub(r'^\s+',"", line)
+# Split on whitespace
+code = re.split(r'\s+', line)
+for c in code:
+# Require a legal hex number, two digits
+if (not(re.search(r'^[0-9A-Fa-f][0-9A-Fa-f]$', c))):
+die("Unexpected octet %s" % c);
+aml.append(int(c, 16));
+
+# Process aml bytecode array, decoding AML
+# Given method offset, find its name offset
+def aml_method_name(lineno, offset):
+#0x14 MethodOp PkgLength NameString MethodFlags TermList
+if (aml[offset] != 0x14):
+die( "Method after input line $lineno offset $offset: "
+" expected 0x14 actual 0x%x" % aml[offset]);
+offset += 1;
+# PkgLength can be multibyte. Bits 8-7 give the # of extra bytes.
+pkglenbytes = aml[offset] >> 6;
+offset += 1 + pkglenbytes;
+return offset;
+
+for line in fileinput.input():
+# Strip trailing newline
+line.rstrip();
+# line number and debug string to output in case of errors
+lineno = lineno + 1
+debug = "input line %d: %s" % (lineno, line)
+#ASL listing: space, then line#, then , then code
+pasl = re.compile('^\s+([0-9]+)\.\.\.\.\s*')
+m = pasl.search(line)
+if (m):
+add_asl(pasl.sub("", line));
+# AML listing: offset in hex, then , then code
+paml = re.compile('^([0-9A-Fa-f]+)\.\.\.\.\s*')
+m = paml.search(line)
+if (m):
+add_aml(m.group(1), paml.sub("", line))
+
+# Now go over code, look for EJ0_ methods
+# For each such method, output slot mask from the
+# preceding _ADR line, as well as the method name offset.
+
+for i in range(len(asl)):
+l = asl[i].line
+debug = "input line %d: %s" % (asl[i].lineno, asl[i].line)
+# match: Method (EJ0_,1)
+if (not(re.search(r'^Method\s*\(\s*EJ0_\s*[,)]', l))):
+# Make sure we do not miss any EJ0_:
+# die if EJ0_ is found anywhere else in source code
+if (re.search(r'EJ0_', l)):
+die("Stray EJ0_ detected");
+continue
+# EJ0_ found. Previous line must be _ADR
+p = asl[i - 1].line
+# match: Name (_ADR, 0x)
+padr = re.compile('Name\s*\(\s*_ADR\s*,\s*0x([0-9A-Fa-f]+)\s*\)')
+m = padr.search(p);
+if (not m):
+die("_ADR not found before EJ0_ ")
+
+adr = int(m.group(1), 16)
+slot = adr >> 16;
+if (slot > 31):
+die("_ADR device out of range: actual %d expected 0 to 31" % slot)
+
+# We have offset of EJ0_ method in code
+# Now find EJ0_ itself
+ej0 = aml_method_name(asl[i].lineno, asl[i].aml_offset)
+# Verify AML: name must be EJ0_:
+if ((aml[ej0 + 0] != ord('E')) or
+(aml[ej0 + 1] != ord('J')) or
+(aml[ej0 + 2] != ord('0')) or
+(aml[ej0 + 3] != ord('_'))):
+die("AML offset 0x%x does not match EJ0_" % ej0)
+
+# OK we are done. Output slot mask and offset
+output.append("{.slot_mask = 0x%x, .offset = 0x%x}" %
+  (0x1 << slot, ej0))
+
+debug = "at end of file"
+
+# Pretty print output
+if (not len(output)):
+die("No EJ0_ Method found!")
+
+print '''
+static struct aml_ej0_data {
+unsigned slot_mask;
+unsigned offset;
+} aml_ej0_data[] = {'''
+
+print ",\n".join(output)
+print '};\n'
+
-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
th

Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 09:03:51PM +0100, Jamie Lokier wrote:
> Marcelo Tosatti wrote:
> > In case the VM stops for whatever reason, the host system is not
> > supposed to adjust time related hardware state to compensate, in an
> > attempt to present apparent continuous time.
> > 
> > If you save a VM and then restore it later, it is the guest
> > responsability to adjust its time representation.
> 
> If the guest doesn't know it's been stopped, then its time
> representation will be wrong until it finds out, e.g. after a few
> minutes with NTP, or even a seconds can be too long.
> 
> That is sad when it happens because it breaks the coherence of any
> timed-lease caching the guest is involved in.  I.e. where the guest
> acquires a lock on some data object (like a file in NFS) that it can
> efficiently access without network round trips (similar to MESI), with
> all nodes having agreed that it's coherent for, say, 5 seconds before
> renewing or breaking.  (It's just a way to reduce latency.)
> 
> But we can't trust CLOCK_MONOTONIC when a VM is involved, it's just
> one of those facts of life.  So instead the effort is to try and
> detect when a VM is involved and then distrust the clock.
> 
> (Non-VM) suspend/resume is similar, but there's usually a way to
> be notified about that as it happens.
> 
> -- Jamie

Thats pretty bad.

Time leased caching cannot be enabled in conjunction with save/restore
without VM involvement (the hypervisor management application must
notify the guest so it can drop such cache/writeback pending data).

But, if paravirtual clock is available (Linux VMs), it should be
possible to add an offset similarly to how suspend/resume is performed,
via a high priority channel, before resuming operation. We should look
into that possibility.

Thanks

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Marcelo Tosatti
On Wed, Sep 21, 2011 at 07:11:08AM -0700, Dave Hansen wrote:
> On Tue, 2011-09-20 at 16:55 -0300, Marcelo Tosatti wrote:
> > > and the wall clock stays behind my host wall clock by the amount of
> > > time it took to resume.
> > 
> > This is expected, similar to savevm/loadvm. 
> 
> That seems like pretty undesirable behavior to me.  It's too bad that it
> does that with savevm/loadvm, but is it really behavior that we want to
> spread?

Not really. savevm/loadvm has more widespread use, so a solution could
apply equally to both cases.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-21 Thread Marcelo Tosatti
On Wed, Sep 21, 2011 at 11:46:03AM +0300, Avi Kivity wrote:
> On 09/20/2011 08:28 PM, Avi Kivity wrote:
> >On 09/20/2011 07:30 PM, Marcelo Tosatti wrote:
> >>> >
> >>> >>   We do have a small issue.  If we exit during
> >>NMI-blocked-by-STI and
> >>> >>   nmi_pending == 2, then we lose the second interrupt.
> >>Should rarely
> >>> >>   happen, since external interrupts never exit in that
> >>condition, but
> >>> >>   it's a wart.

Actually exits in the window between 

- increase of nmi_queued 
and 
- transfer to nmi_pending/nmi_injected

Lose all nmi_queued values, no?

> >>> >
> >>> >And the above system reset case, you should be able to handle it by
> >>> >saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).
> >>>
> >>>  We could just add a KVM_CAP (and flag) that extends nmi_pending from
> >>>  a bool to a counter.
> >>
> >>Or just add a new field to the pad.
> >>
> >
> >Okay; I'll address this in a follow-on patch (my preference is
> >making nmi_pending a counter).
> >
> 
> Yet another way to do this is to redefine .injected (just in the
> API) to mean: inject immediately, unless blocked by interrupt
> shadow; in this case inject in the next instruction.  No KVM_CAP or
> anything.
> 
> The drawback is that if we hit the corner case of two NMIs queued
> and held back by interrupt shadow, an older kvm live migration
> target will not run the guest (exit with invalid state).  The
> advantage is that no user space or API changes are necessary.
> 
> Given that to get into this corner you need an NMI intensive load
> AND a sti; blah pair that spans two pages AND have the second page
> unavailable when those NMIs hit, I think it's better to avoid the
> API change.  Opinions?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pci passthrough fail on fedora 15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)

2011-09-21 Thread Erik Flister
i posted this to virt-tools but got no replies, hoping you guys can help me.  :)


>- Forwarded Message -
>>From: Erik Flister 
>>To: erik flister ; "virt-tools-l...@redhat.com" 
>>
>>Sent: Tuesday, September 20, 2011 9:36 AM
>>Subject: Re: [virt-tools-list] pci passthrough fail on fedora 
>>15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)
>>
>>
>>i found this discussion of the same error:
>>
>>http://www.linux-kvm.com/ content/pci-passthrough-error
>>> "I've solved my problem by deleting the '.save' file in
>>> /var/lib/libvirt/qemu/save"
>>
>>but i don't have that file:
>>
>>> sudo ls -al
/var/lib/libvirt/qemu/save
>>total 8
>>drwxr-xr-x 2 qemu qemu 4096 Sep 16 03:22 .
>>drwxr-x--- 5 qemu qemu 4096 Sep 19 10:13 ..
>>
>>
>>-erik
>>
>>
>>
>>>
>>>From: erik flister 
>>>To: virt-tools-l...@redhat.com
>>>Sent: Monday, September 19, 2011 9:19 PM
>>>Subject: [virt-tools-list] pci passthrough fail on fedora 
>>>15/kvm/qemu/virt-manager hosting win 7 (all 64 bit)
>>>
>>>AMD
phenom II X6 1075T proc
>>>ASUS M4A87TD mobo
>>>BIOS ver 2001 (built 3/8/11)
>>>
>>>advanced/cpu config/secure virtual machine mode enabled in BIOS
>>>
>>>not sure if this turns on iommu, how do i verify?
>>>
>>>/proc/cpuinfo has hits for svm but not iommu or vmx.
>>>
 dmesg | grep -i iommu
>>>[    0.00] Please enable the IOMMU option in the BIOS setup
>>>[    1.515596] PCI-DMA: using GART IOMMU.
>>>[    1.515599] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
>>>
>>>note my pci cards have windows drivers, but not linux drivers as far as i 
>>>know.
>>>
>>>i have latest yummed kvm/qemu/virt-manager.
>>>
>>>i am using this virtualization method:
>>>http://www.techotopia.com/index.php/Running_Windows_on_Fedora_Using_KVM_Virtualization
>>>
>>>i didn't know to disconnect the pci
cards from the
host before adding
>>>them to the guest until finding:
>>>http://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/sect-Virtualization-PCding_a_PCI_device_to_a_host.html
>>>
 lspci
>>>
>>>02:06.0 Unassigned class [ff00]: National Instruments PCI-6110
>>>02:07.0 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
>>>02:07.1 Serial controller: NetMos Technology PCI 9865 Multi-I/O Controller
>>>02:07.2 Parallel controller: NetMos Technology PCI 9865 Multi-I/O Controller
>>>
 lspci -n
>>>
>>>02:06.0 ff00: 1093:14e0
>>>02:07.0 0700: 9710:9865
>>>02:07.1 0700: 9710:9865
>>>02:07.2 0701: 9710:9865
>>>
 virsh nodedev-list | grep pci
>>>
>>>pci__02_06_0
>>>pci__02_07_0
>>>pci__02_07_1
>>>pci__02_07_2
>>>
 sudo virsh
nodedev-dettach
pci__02_06_0
>>>Device pci__02_06_0 dettached
>>>
>>>
>>>but then, after adding it to the guest and trying to boot it, i get:
>>>
>>>Error starting domain: this function is not supported by the
>>>connection driver: Unable to reset PCI device :02:06.0: this
>>>function is not supported by the connection driver: Active
>>>:02:07.0 devices on bus with :02:06.0, not doing bus reset
>>>
>>>Traceback (most recent call last):
>>>  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 45, in 
>>>cb_wrapper
>>>    callback(asyncjob, *args, **kwargs)
>>>  File "/usr/share/virt-manager/virtManager/engine.py", line 959, in 
>>>asyncfunc
>>>    vm.startup()
>>>  File "/usr/share/virt-manager/virtManager/domain.py", line 1128, in startup
>>>    self._backend.create()
>>>  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 330, in create
>>>    if ret == -1: raise libvirtError
('virDomainCreate() failed', dom=self)
>>>
>>>
>>>
>>>if i disable the other devices on bus 02 and add them to the guest:
 sudo virsh nodedev-dettach pci__02_07_0
>>>Device pci__02_07_0 dettached
>>>
 sudo virsh nodedev-dettach pci__02_07_1
>>>Device pci__02_07_1 dettached
>>>
 sudo virsh nodedev-dettach pci__02_07_2
>>>Device pci__02_07_2 dettached
>>>
>>>i then get this error on booting the guest (but with the same stack
>>>trace as above):
>>>
>>>Error starting domain: cannot send monitor command
>>>'{"execute":"qmp_capabilities"}': Connection reset by peer
>>>
>>>and then networking fails on the host until i reboot, and fedora won't
>>>even shutdown all the way (i have to do a hard reboot by holding down
>>>the power button).
>>>
>>>any idea how i can fix this?  why isn't detaching the device
>>>automatically done when it is added to the guest?  even if i get this
>>>working, it seems i will
have to detach stuff every time i reboot the
>>>host?  finally, in this case, i actually do want to give everything on
>>>bus 02 to the guest, but what if i didn't?
>>>
>>>thanks for your help!
>>>-erik
>>>
>>>
>>>see also iommu/bios stuff here:
>>>http://docs.fedoraproject.org/en-US/Fedora/13/html-single/Virtualization_Guide/index.html
>>>
>>>___
>>>virt-tools-list mailing list
>>>virt-tools-l...@redhat.com
>>>https://www.redhat.com/mailman/listinfo/virt

[RFC/PATCH] virtio-console: wait for console ports

2011-09-21 Thread Christian Borntraeger
Amit,

can you have a look at the patch below and give feedback or apply
if appropriate?

---
On s390 I have seen some random "Warning: unable to open an initial
console" boot failure. Turns out that tty_open fails, because the
hvc_alloc was not yet done. In former times this could not happen,
since the probe function automatically called hvc_alloc. With newer
versions (multiport) some host<->guest interaction is required
before hvc_alloc is called. This might be too late, especially if
an initramfs is involved. Lets use a completion if we have
multiport and an early console.



Signed-off-by: Christian Borntraeger 

---
 drivers/char/virtio_console.c |   18 ++
 1 file changed, 18 insertions(+)

Index: b/drivers/char/virtio_console.c
===
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -19,6 +19,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -73,6 +74,7 @@ struct ports_driver_data {
 static struct ports_driver_data pdrvdata;
 
 DEFINE_SPINLOCK(pdrvdata_lock);
+DECLARE_COMPLETION(port_added);
 
 /* This struct holds information that's relevant only for console ports */
 struct console {
@@ -1352,6 +1354,7 @@ static void handle_control_message(struc
break;
 
init_port_console(port);
+   complete(&port_added);
/*
 * Could remove the port here in case init fails - but
 * have to notify the host first.
@@ -1648,6 +1651,10 @@ static int __devinit virtcons_probe(stru
struct ports_device *portdev;
int err;
bool multiport;
+   bool early = early_put_chars != 0;
+
+   /* Ensure to read early_put_chars now */
+   barrier();
 
portdev = kmalloc(sizeof(*portdev), GFP_KERNEL);
if (!portdev) {
@@ -1719,6 +1726,17 @@ static int __devinit virtcons_probe(stru
 
__send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
   VIRTIO_CONSOLE_DEVICE_READY, 1);
+
+   /* If there was an early virtio console, assume that there are no
+* other consoles. We need to wait until the hvc_alloc matches the
+* hvc_instantiate, otherwise tty_open will complain, resulting in
+* a "Warning: unable to open an initial console" boot failure.
+* Without multiport this is done in add_port above. With multiport
+* this might take some host<->guest communication - thus we have to
+* wait. */
+   if (multiport && early)
+   wait_for_completion(&port_added);
+
return 0;
 
 free_vqs:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: percpu crash on NetBurst

2011-09-21 Thread Avi Kivity

On 09/21/2011 05:28 PM, Avi Kivity wrote:


Coming back to this, the trigger if cpuid family=6 and model>=13 
(model 12 works).  Looks like the code disables rep_good is some MSR 
doesn't have the expected value.  While we should configure the MSR 
correctly, it looks like the fallback code for !rep_good is broken.  
Will look further.




Ok, without rep_good, memcpy() sometimes copies backwards.  A stale copy 
of memmove_64.c I had around got picked up instead of memmove_64.S; that 
memmove() used memcpy() in the copy-forward case.


So, nothing to see, sorry about the noise.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 05:27:32PM +0300, Gleb Natapov wrote:
> On Wed, Sep 21, 2011 at 03:44:29PM +0300, Michael S. Tsirkin wrote:
> > script ./src/find_ej0.pl finds all instances of
> > method named EJ0_ and the matching _ADR information,
> > and outputs the AML offset and slot mask of each.
> > 
> There is tools/ directory for such kind of scripts. Most (if not all) of
> scripts there are in python though.

OK, rewriting that in python should be easy.
I'll wait a bit for more comments on the design though.

> Perl should die painful death.
> 
> This approach delivers nice result, but since the script does not really
> decodes AML, but tries to match ASL source code with regular expressions,
> it introduces some assumptions to the code that make DSDT code less
> hackable. I'll hate to be the one who will have to change PCI device
> definitions in DSDT next time.

There are three requirements now:
1. don't use the name EJ0_ anywhere if you don't want it patches
2. _ADR must be an integer constant
3. put _ADR name immediately before EJ0_ method
I tried to make it easy to obey these rules
by adding comments in source code.

I don't believe a generic mechanism that does not place
any restrictions on language use is possible without adding an
AML interpreter in bios.

> Generally speaking finding an offset of some scope in AML is useful not
> only for PCI hotplug. For instance we want to make S3/S4 capability
> configurable by a command line switch, but this also requires DSDT
> patching and having automatic way to find _S3_/_S4_ offset is required
> for that too (we do not what to find it by hand each time DSDT is
> recompiled).

Right. So that would be an easy extension.
We could also add some directives for the tool
(e.g. in C comments) so that you can e.g. find more names
just by adding such a directive in dsl.

I'll be happy to work on that preferably
after we merge a simple version of the tool first.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: percpu crash on NetBurst

2011-09-21 Thread Avi Kivity

On 08/08/2011 12:55 PM, Tejun Heo wrote:

Hello, Avi.

On Sun, Aug 07, 2011 at 06:32:35PM +0300, Avi Kivity wrote:
>  qemu, under some conditions (-cpu host or -cpu kvm64), erroneously
>  passes family=15 as the virtual cpuid.  This causes a BUG() in
>  percpu code during late boot:
>
>  [ cut here ]
>  kernel BUG at mm/percpu.c:577!






>  All this applies to v3.0; current upstream (c2f340a69ca) fails even
>  worse, haven't yet determined exactly why.
>
>  I'm surprised this hasn't been reported before; Ingo, don't you have
>  family=15 hosts in your test farm?

Hmmm... I can't trigger the problem w/ kvm64 (I tried mounting and
unmounting filesystems but it worked okay) and am quite skeptical this
is a wide spread problem given that the percpu core code is used very
widely and hasn't seen a lot of changes lately.  Is there anything
specific you need to do to trigger the condition?  Can you try to
print out the s_files addresses being allocated and freed?



Coming back to this, the trigger if cpuid family=6 and model>=13 (model 
12 works).  Looks like the code disables rep_good is some MSR doesn't 
have the expected value.  While we should configure the MSR correctly, 
it looks like the fallback code for !rep_good is broken.  Will look further.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Gleb Natapov
On Wed, Sep 21, 2011 at 03:44:29PM +0300, Michael S. Tsirkin wrote:
> script ./src/find_ej0.pl finds all instances of
> method named EJ0_ and the matching _ADR information,
> and outputs the AML offset and slot mask of each.
> 
There is tools/ directory for such kind of scripts. Most (if not all) of
scripts there are in python though. Perl should die painful death.

This approach delivers nice result, but since the script does not really
decodes AML, but tries to match ASL source code with regular expressions,
it introduces some assumptions to the code that make DSDT code less
hackable. I'll hate to be the one who will have to change PCI device
definitions in DSDT next time.

Generally speaking finding an offset of some scope in AML is useful not
only for PCI hotplug. For instance we want to make S3/S4 capability
configurable by a command line switch, but this also requires DSDT
patching and having automatic way to find _S3_/_S4_ offset is required
for that too (we do not what to find it by hand each time DSDT is
recompiled).

> Signed-off-by: Michael S. Tsirkin 
> ---
>  src/find_ej0.pl |  136 
> +++
>  1 files changed, 136 insertions(+), 0 deletions(-)
>  create mode 100755 src/find_ej0.pl
> 
> diff --git a/src/find_ej0.pl b/src/find_ej0.pl
> new file mode 100755
> index 000..37e8a8c
> --- /dev/null
> +++ b/src/find_ej0.pl
> @@ -0,0 +1,136 @@
> +#!/usr/bin/perl
> +
> +# Process mixed ASL/AML listing (.lst file) produced by iasl -l
> +# Locate all occurences of Name _ADR followed by Method EJ0_
> +# Output slot info from _ADR and offset of method name in AML
> +
> +use strict;
> +
> +my @aml = ();
> +my @asl = ();
> +my @asl_lineno = ();
> +my @asl_to_aml_offset = ();
> +my @output = ();
> +
> +#Store an ASL command, matching AML offset, and input line (for debugging)
> +sub add_asl {
> +my $srcline = shift(@_);
> +my $line = shift(@_);
> +push @asl, $line;
> +push @asl_lineno, $.;
> +push @asl_to_aml_offset, $#aml + 1;
> +}
> +
> +#Store an AML byte sequence
> +#Verify that offset output by iasl matches # of bytes so far
> +sub add_aml {
> +my $offset = shift(@_);
> +my $line = shift(@_);
> +my $o = hex($offset);
> +# Sanity check: offset must match
> +die "Offset $o != " .($#aml + 1) if ($o != $#aml + 1);
> +# Strip any traling dots and ASCII dump after "
> +$line =~ s/\s*\.*\s*".*//;
> +
> +my @code = split(' ', $line);
> +foreach my $c (@code) {
> +if (not($c =~ m/^[0-9A-Fa-f][0-9A-Fa-f]$/)) {
> +die "Unexpected octet $c";
> +}
> +push @aml, hex($c);
> +}
> +}
> +
> +# Process aml bytecode array, decoding AML
> +# Given method offset, find its name offset
> +sub aml_method_name {
> +my $lineno = shift(@_);
> +my $offset = shift(@_);
> +#0x14 MethodOp PkgLength NameString MethodFlags TermList
> +if ($aml[$offset] != 0x14) {
> +die "Method after input line $lineno offset $offset: " .
> +" expected 0x14 actual ". sprintf("0x%x", $aml[$offset]);
> +}
> +$offset += 1;
> +# PkgLength can be multibyte. Bits 8-7 give the # of extra bytes.
> +my $pkglenbytes = $aml[$offset] >> 6;
> +$offset += 1 + $pkglenbytes;
> +return $offset;
> +}
> +
> +while (<>) {
> +#Strip trailing newline
> +chomp;
> + #ASL listing: space, then line#, then , then code
> + if (s#^\s+([0-9]+)\.\.\.\.\s*##) {
> +add_asl($1, $_);
> + }
> +# AML listing: offset in hex, then , then code
> + if (s#^([0-9A-F]+)\.\.\.\.\s*##) {
> +add_aml($1, $_);
> +}
> + # Ignore everything else
> +}
> +
> +# Now go over code, look for EJ0_ methods
> +# For each such method, output slot mask from the
> +# preceding _ADR line, as well as the method name offset.
> +for (my $i = 0; $i <= $#asl; $i++) {
> +my $l = $asl[$i];
> +# match: Method (EJ0_,1)
> +if (not $l =~ m#^Method\s*\(\s*EJ0_\s*[,)]#) {
> +# Make sure we do not miss any EJ0_:
> +# die if EJ0_ is found anywhere else in source code
> +if ($l =~ m#EJ0_#) {
> +die "Stray EJ0_ detected at input line $asl_lineno[$i]: 
> $asl[$i]";
> +}
> +next;
> +}
> +# EJ0_ found. Previous line must be _ADR
> +my $p = $i > 0 ? $asl[$i - 1] : "";
> +# match: Name (_ADR, 0x)
> +if (not ($p =~ m#Name\s*\(\s*_ADR\s*,\s*0x([0-9A-Fa-f]+)\s*\)#)) {
> +die "_ADR not found before EJ0_ ".
> +"at input line $asl_lineno[$i]: $asl[$i]";
> +}
> +my $adr = hex($1);
> +my $slot = $adr >> 16;
> +if ($slot > 31) {
> +die "_ADR device out of range: actual $slot " .
> +"expected 0 to 31 at input line $asl_lineno[$i]: $asl[$i]";
> +}
> +
> +# We have offset of EJ0_ method in code
> +# Now find EJ0_ itself
> +my $offset = $asl_to_aml_offset[$i];
> +my $ej0 = aml_method_name($asl_lineno[$i], $o

Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-21 Thread Dave Hansen
On Tue, 2011-09-20 at 16:55 -0300, Marcelo Tosatti wrote:
> > and the wall clock stays behind my host wall clock by the amount of
> > time it took to resume.
> 
> This is expected, similar to savevm/loadvm. 

That seems like pretty undesirable behavior to me.  It's too bad that it
does that with savevm/loadvm, but is it really behavior that we want to
spread?

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 08:47:39AM -0400, Kevin O'Connor wrote:
> On Wed, Sep 21, 2011 at 02:09:08PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Sep 21, 2011 at 01:39:22AM -0400, Amos Kong wrote:
> > > - Original Message -
> > > > How about moving code into functions so that it isn't duplicated for
> > > > each PCI device.  See the patch below as an example (100% untested).
> > 
> > Hmm, I sent patches that did a similar thing but
> > in a slightly more compact way.
> > Message ids:
> > 20110919092932.gb4...@redhat.com
> > 20110919093644.gc4...@redhat.com
> > 20110919100434.ga6...@redhat.com
> > 
> > Did they not reach you or something's wrong with them?
> 
> I received them, but when I saw Amos' v2 patch I thought he included
> them.
> 
> > > > +/* Bulk generated PCI hotplug devices */
> > > > +#define hotplug_func(nr, fn)\
> > > > +Device (S##nr##fn) {\
> > > > +Name (_ADR, 0x##nr##000##fn)\
> > > > +Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
> > > > +Name (_SUN, 0x##nr) \
> > > > +}
> > 
> > The fundamental question is still why would
> > we have _EJ0 methods in functions >0 when they are
> > not individually hotpluggable.
> > I think only function 0 should have _EJ0.
> 
> I don't know the answer to this question.
> 
> Maybe we should just collapse the current definitions and then put the
> fixes and enhancements on top of the collapsed version.
> 
> -Kevin

OK, I'll refactor my patches.
I just sent a patchset which in particular removes
the gen_pci_device macro, probably best to work on top of that?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-21 Thread Kevin O'Connor
On Wed, Sep 21, 2011 at 02:09:08PM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 21, 2011 at 01:39:22AM -0400, Amos Kong wrote:
> > - Original Message -
> > > How about moving code into functions so that it isn't duplicated for
> > > each PCI device.  See the patch below as an example (100% untested).
> 
> Hmm, I sent patches that did a similar thing but
> in a slightly more compact way.
> Message ids:
> 20110919092932.gb4...@redhat.com
> 20110919093644.gc4...@redhat.com
> 20110919100434.ga6...@redhat.com
> 
> Did they not reach you or something's wrong with them?

I received them, but when I saw Amos' v2 patch I thought he included
them.

> > > +/* Bulk generated PCI hotplug devices */
> > > +#define hotplug_func(nr, fn)\
> > > +Device (S##nr##fn) {\
> > > +Name (_ADR, 0x##nr##000##fn)\
> > > +Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
> > > +Name (_SUN, 0x##nr) \
> > > +}
> 
> The fundamental question is still why would
> we have _EJ0 methods in functions >0 when they are
> not individually hotpluggable.
> I think only function 0 should have _EJ0.

I don't know the answer to this question.

Maybe we should just collapse the current definitions and then put the
fixes and enhancements on top of the collapsed version.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] acpi: generate mixed asl/aml listing

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 03:44:21PM +0300, Michael S. Tsirkin wrote:
> Use iasl -l flag to produce a mixed listing, where a
> source line is followed by matching AML.
> Since we use a preprocessed input, this generates long lines with
> multiple commands per line. To make it possible to match
> AML to source exactly, split the input that we supply iasl,
> with each command on a separate line.
> 
> Signed-off-by: Michael S. Tsirkin 

I thought I'd add an example listing here:
 467 Device (S1) {

080B5B 82 25 53 31 5F 5F ..."[.%S1__"

 468 Name (_ADR, 0x0001)

081208 5F 41 44 52 0C 00 00 "._ADR..."
081A01 00 ..".."

 469 Method (EJ0_,1) {

081C14 0F 45 4A 30 5F 01 ..."..EJ0_."

 470 Store(ShiftLeft(1, 0x0001), B0EJ)

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] acpi: remove _RMV

2011-09-21 Thread Michael S. Tsirkin
The macro gen_pci_device is used to add _RMV
method to a slot device so it is no longer needed:
presence of _EJ0 now indicates that the slot is ejectable.
It is also placing two devices with the same _ADR
on the same bus, which isn't defined by the ACPI spec.
So let's remove it.

Signed-off-by: Michael S. Tsirkin 
---
 src/acpi-dsdt.dsl |   49 -
 1 files changed, 0 insertions(+), 49 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 3d43e4b..44a27e4 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -121,12 +121,6 @@ DefinitionBlock (
 {
 B0EJ, 32,
 }
-
-OperationRegion(RMVC, SystemIO, 0xae0c, 0x04)
-Field(RMVC, DWordAcc, NoLock, WriteAsZeros)
-{
-PCRM, 32,
-}
 // Method EJ0_ will be patched by BIOS to _EJ0
 // at runtime, if the slot is detected to support hotplug.
 // Must be immediately preceded by _ADR for this to work.
@@ -464,49 +458,6 @@ DefinitionBlock (
DRSJ, 32
}
}
-
-#define gen_pci_device(name, nr)\
-Device(SL##name) {  \
-Name (_ADR, nr##)   \
-Method (_RMV) { \
-If (And(\_SB.PCI0.PCRM, ShiftLeft(1, nr))) {\
-Return (0x1)\
-}   \
-Return (0x0)\
-}   \
-Name (_SUN, name)   \
-}
-
-/* VGA (slot 1) and ISA bus (slot 2) defined above */
-   gen_pci_device(3, 0x0003)
-   gen_pci_device(4, 0x0004)
-   gen_pci_device(5, 0x0005)
-   gen_pci_device(6, 0x0006)
-   gen_pci_device(7, 0x0007)
-   gen_pci_device(8, 0x0008)
-   gen_pci_device(9, 0x0009)
-   gen_pci_device(10, 0x000a)
-   gen_pci_device(11, 0x000b)
-   gen_pci_device(12, 0x000c)
-   gen_pci_device(13, 0x000d)
-   gen_pci_device(14, 0x000e)
-   gen_pci_device(15, 0x000f)
-   gen_pci_device(16, 0x0010)
-   gen_pci_device(17, 0x0011)
-   gen_pci_device(18, 0x0012)
-   gen_pci_device(19, 0x0013)
-   gen_pci_device(20, 0x0014)
-   gen_pci_device(21, 0x0015)
-   gen_pci_device(22, 0x0016)
-   gen_pci_device(23, 0x0017)
-   gen_pci_device(24, 0x0018)
-   gen_pci_device(25, 0x0019)
-   gen_pci_device(26, 0x001a)
-   gen_pci_device(27, 0x001b)
-   gen_pci_device(28, 0x001c)
-   gen_pci_device(29, 0x001d)
-   gen_pci_device(30, 0x001e)
-   gen_pci_device(31, 0x001f)
 }
 
 /* PCI IRQs */
-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] acpi: EJ0 method name patching

2011-09-21 Thread Michael S. Tsirkin
Modify ACPI to only supply _EJ0 methods for PCI
slots that support hotplug.

This is done by runtime patching:
- Rename _EJ0 methods for PCI slots in DSDT to EJ0_:
  note that this has the same checksum, but
  is ignored by OSPM.
- At compile time, look for these methods in ASL source,
  find the matching AML,  and store the offsets of these methods
  in a table named aml_ej0_data.
  Note that we are looking for EJ0_ in source code,
  so we'll be able to write EJ0 if we want to and the script
  will not match it.
- At run time, go over aml_ej0_data, check which slots
  support hotplug and patch the ACPI table, replacing EJ0_ with _EJ0.

Note: the method used is robust in that we don't need
to change any offsets manually in case of ASL code changes.
As all parsing is done at compile time, any unexpected input causes
build failure, not a runtime failure.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile  |3 ++-
 src/acpi-dsdt.dsl |9 +++--
 src/acpi.c|   11 +++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 5c011bb..dee93d6 100644
--- a/Makefile
+++ b/Makefile
@@ -197,7 +197,8 @@ src/%.hex: src/%.dsl src/splitdsl.pl src/find_ej0.pl
$(Q)cpp -P $< > $(OUT)$*.dsl.i.orig
$(Q)./src/splitdsl.pl $(OUT)$*.dsl.i.orig > $(OUT)$*.dsl.i
$(Q)iasl -l -tc -p $(OUT)$* $(OUT)$*.dsl.i
-   $(Q)cp $(OUT)$*.hex $@
+   $(Q)./src/find_ej0.pl $(OUT)$*.lst > $(OUT)$*.off
+   $(Q)cat $(OUT)$*.hex $(OUT)$*.off > $@
 
 $(OUT)ccode32flat.o: src/acpi-dsdt.hex
 
diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 08412e2..3d43e4b 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -127,11 +127,16 @@ DefinitionBlock (
 {
 PCRM, 32,
 }
-
+// Method EJ0_ will be patched by BIOS to _EJ0
+// at runtime, if the slot is detected to support hotplug.
+// Must be immediately preceded by _ADR for this to work.
+// EJ0_ is not allowed anywhere else in this file,
+// if you really want to use it, write EJ0 which
+// creates the same AML but isn't patched.
 #define hotplug_slot(name, nr) \
 Device (S##name) {\
Name (_ADR, nr##)  \
-   Method (_EJ0,1) {  \
+   Method (EJ0_,1) {  \
 Store(ShiftLeft(1, nr), B0EJ) \
 Return (0x0)  \
}  \
diff --git a/src/acpi.c b/src/acpi.c
index 6bb6ff6..cbb5143 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -198,6 +198,8 @@ struct srat_memory_affinity
 u32reserved3[2];
 } PACKED;
 
+#define PCI_RMV_BASE 0xae0c
+
 #include "acpi-dsdt.hex"
 
 static void
@@ -243,6 +245,8 @@ build_fadt(struct pci_device *pci)
 struct fadt_descriptor_rev1 *fadt = malloc_high(sizeof(*fadt));
 struct facs_descriptor_rev1 *facs = memalign_high(64, sizeof(*facs));
 void *dsdt = malloc_high(sizeof(AmlCode));
+u32 rmvc_pcrm;
+int i;
 
 if (!fadt || !facs || !dsdt) {
 warn_noalloc();
@@ -255,7 +259,13 @@ build_fadt(struct pci_device *pci)
 facs->length = cpu_to_le32(sizeof(*facs));
 
 /* DSDT */
 memcpy(dsdt, AmlCode, sizeof(AmlCode));
+rmvc_pcrm = inl(PCI_RMV_BASE);
+for (i = 0; i < sizeof(aml_ej0_data) / sizeof(*aml_ej0_data); ++i) {
+if (rmvc_pcrm & aml_ej0_data[i].slot_mask) {
+memcpy(dsdt + aml_ej0_data[i].offset, "_EJ0", 4);
+}
+}
 
 /* FADT */
 memset(fadt, 0, sizeof(*fadt));
-- 
1.7.5.53.gc233e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] acpi: add aml/asl parsing script

2011-09-21 Thread Michael S. Tsirkin
script ./src/find_ej0.pl finds all instances of
method named EJ0_ and the matching _ADR information,
and outputs the AML offset and slot mask of each.

Signed-off-by: Michael S. Tsirkin 
---
 src/find_ej0.pl |  136 +++
 1 files changed, 136 insertions(+), 0 deletions(-)
 create mode 100755 src/find_ej0.pl

diff --git a/src/find_ej0.pl b/src/find_ej0.pl
new file mode 100755
index 000..37e8a8c
--- /dev/null
+++ b/src/find_ej0.pl
@@ -0,0 +1,136 @@
+#!/usr/bin/perl
+
+# Process mixed ASL/AML listing (.lst file) produced by iasl -l
+# Locate all occurences of Name _ADR followed by Method EJ0_
+# Output slot info from _ADR and offset of method name in AML
+
+use strict;
+
+my @aml = ();
+my @asl = ();
+my @asl_lineno = ();
+my @asl_to_aml_offset = ();
+my @output = ();
+
+#Store an ASL command, matching AML offset, and input line (for debugging)
+sub add_asl {
+my $srcline = shift(@_);
+my $line = shift(@_);
+push @asl, $line;
+push @asl_lineno, $.;
+push @asl_to_aml_offset, $#aml + 1;
+}
+
+#Store an AML byte sequence
+#Verify that offset output by iasl matches # of bytes so far
+sub add_aml {
+my $offset = shift(@_);
+my $line = shift(@_);
+my $o = hex($offset);
+# Sanity check: offset must match
+die "Offset $o != " .($#aml + 1) if ($o != $#aml + 1);
+# Strip any traling dots and ASCII dump after "
+$line =~ s/\s*\.*\s*".*//;
+
+my @code = split(' ', $line);
+foreach my $c (@code) {
+if (not($c =~ m/^[0-9A-Fa-f][0-9A-Fa-f]$/)) {
+die "Unexpected octet $c";
+}
+push @aml, hex($c);
+}
+}
+
+# Process aml bytecode array, decoding AML
+# Given method offset, find its name offset
+sub aml_method_name {
+my $lineno = shift(@_);
+my $offset = shift(@_);
+#0x14 MethodOp PkgLength NameString MethodFlags TermList
+if ($aml[$offset] != 0x14) {
+die "Method after input line $lineno offset $offset: " .
+" expected 0x14 actual ". sprintf("0x%x", $aml[$offset]);
+}
+$offset += 1;
+# PkgLength can be multibyte. Bits 8-7 give the # of extra bytes.
+my $pkglenbytes = $aml[$offset] >> 6;
+$offset += 1 + $pkglenbytes;
+return $offset;
+}
+
+while (<>) {
+#Strip trailing newline
+chomp;
+   #ASL listing: space, then line#, then , then code
+   if (s#^\s+([0-9]+)\.\.\.\.\s*##) {
+add_asl($1, $_);
+   }
+# AML listing: offset in hex, then , then code
+   if (s#^([0-9A-F]+)\.\.\.\.\s*##) {
+add_aml($1, $_);
+}
+   # Ignore everything else
+}
+
+# Now go over code, look for EJ0_ methods
+# For each such method, output slot mask from the
+# preceding _ADR line, as well as the method name offset.
+for (my $i = 0; $i <= $#asl; $i++) {
+my $l = $asl[$i];
+# match: Method (EJ0_,1)
+if (not $l =~ m#^Method\s*\(\s*EJ0_\s*[,)]#) {
+# Make sure we do not miss any EJ0_:
+# die if EJ0_ is found anywhere else in source code
+if ($l =~ m#EJ0_#) {
+die "Stray EJ0_ detected at input line $asl_lineno[$i]: $asl[$i]";
+}
+next;
+}
+# EJ0_ found. Previous line must be _ADR
+my $p = $i > 0 ? $asl[$i - 1] : "";
+# match: Name (_ADR, 0x)
+if (not ($p =~ m#Name\s*\(\s*_ADR\s*,\s*0x([0-9A-Fa-f]+)\s*\)#)) {
+die "_ADR not found before EJ0_ ".
+"at input line $asl_lineno[$i]: $asl[$i]";
+}
+my $adr = hex($1);
+my $slot = $adr >> 16;
+if ($slot > 31) {
+die "_ADR device out of range: actual $slot " .
+"expected 0 to 31 at input line $asl_lineno[$i]: $asl[$i]";
+}
+
+# We have offset of EJ0_ method in code
+# Now find EJ0_ itself
+my $offset = $asl_to_aml_offset[$i];
+my $ej0 = aml_method_name($asl_lineno[$i], $offset);
+# Verify AML: name must be EJ0_:
+if (($aml[$ej0 + 0] != ord('E')) or
+($aml[$ej0 + 1] != ord('J')) or
+($aml[$ej0 + 2] != ord('0')) or
+($aml[$ej0 + 3] != ord('_'))) {
+die "AML after input line $asl_lineno[$i] offset $ej0 " .
+"does not match EJ0_";
+}
+
+# OK we are done. Output slot mask and offset
+push @output, sprintf("{.slot_mask = 0x%x, .offset = 0x%x}",
+  0x1 << $slot, $ej0);
+}
+
+# Pretty print output
+if ($#output < 0) {
+die "No EJ0_ Method found!"
+}
+print 

[PATCH 1/4] acpi: generate mixed asl/aml listing

2011-09-21 Thread Michael S. Tsirkin
Use iasl -l flag to produce a mixed listing, where a
source line is followed by matching AML.
Since we use a preprocessed input, this generates long lines with
multiple commands per line. To make it possible to match
AML to source exactly, split the input that we supply iasl,
with each command on a separate line.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile|7 ---
 src/splitdsl.pl |   16 
 2 files changed, 20 insertions(+), 3 deletions(-)
 create mode 100755 src/splitdsl.pl

diff --git a/Makefile b/Makefile
index 109091b..5c011bb 100644
--- a/Makefile
+++ b/Makefile
@@ -192,10 +192,11 @@ $(OUT)vgabios.bin: $(OUT)vgabios.bin.raw tools/buildrom.py
$(Q)./tools/buildrom.py $< $@
 
 ### dsdt build rules
-src/%.hex: src/%.dsl
+src/%.hex: src/%.dsl src/splitdsl.pl src/find_ej0.pl
@echo "Compiling DSDT"
-   $(Q)cpp -P $< > $(OUT)$*.dsl.i
-   $(Q)iasl -tc -p $(OUT)$* $(OUT)$*.dsl.i
+   $(Q)cpp -P $< > $(OUT)$*.dsl.i.orig
+   $(Q)./src/splitdsl.pl $(OUT)$*.dsl.i.orig > $(OUT)$*.dsl.i
+   $(Q)iasl -l -tc -p $(OUT)$* $(OUT)$*.dsl.i
$(Q)cp $(OUT)$*.hex $@
 
 $(OUT)ccode32flat.o: src/acpi-dsdt.hex
diff --git a/src/splitdsl.pl b/src/splitdsl.pl
new file mode 100755
index 000..1caccba
--- /dev/null
+++ b/src/splitdsl.pl
@@ -0,0 +1,16 @@
+#!/usr/bin/perl
+
+while (<>)
+{
+   #Remove // comments if any as we can not split lines after them
+   s#//.*##;
+   #Code after { starts a new line
+   s/(\{)(\s*\S)/$1\n$2/g;
+   #Operator after ) starts a new line
+   s/(\))(\s*[_A-Za-z])/$1\n$2/g;
+   #Code after } starts a new line
+   s/(})(\s*\S)/$1\n$2/g;
+   #} starts a new line
+   s/(\S\s*)(\})/$1\n$2/g;
+   print;
+}
-- 
1.7.5.53.gc233e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-21 Thread Michael S. Tsirkin
Here's a bug: guest thinks it can eject VGA device and ISA bridge.

[root@dhcp74-172 ~]#lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 PCI bridge: Red Hat, Inc. Device 0001
00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device
00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device
01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/
adapter  address  attention  latch  module  power
[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/
adapter  address  attention  latch  module  power

[root@dhcp74-172 ~]# echo 0 > /sys/bus/pci/slots/2/power 
[root@dhcp74-172 ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:03.0 PCI bridge: Red Hat, Inc. Device 0001
00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device
00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device
01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

This is wrong because slots 1 and 2 are marked as not hotpluggable
in qemu.

The reason is that our acpi tables declare both _RMV with value 0,
and _EJ0 method for these slots. What happens in this case
is undocumented by ACPI spec, so linux ignores _RMV,
and windows seems to ignore _EJ0.

The correct way to suppress hotplug is not to have _EJ0,
so this is what this patch does: it probes PIIX and
modifies DSDT to match.

With these patches applied, we get:

[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/
address
[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/
address

I had to add a bit of compile-time infrastructure to handle the
runtime patching in a simple way. I expect that
it will be possible to reuse it if we need to
patch other methods in the future.

Michael S. Tsirkin (4):
  acpi: generate mixed asl/aml listing
  acpi: add aml/asl parsing script
  acpi: EJ0 method name patching
  acpi: remove _RMV

 Makefile  |   10 ++--
 src/acpi-dsdt.dsl |   58 +++
 src/acpi.c|   11 
 src/find_ej0.pl   |  136 +
 src/splitdsl.pl   |   16 ++
 5 files changed, 176 insertions(+), 55 deletions(-)
 create mode 100755 src/find_ej0.pl
 create mode 100755 src/splitdsl.pl

-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nVMX: Fix warning-causing idt-vectoring-info behavior

2011-09-21 Thread Avi Kivity

On 09/21/2011 01:48 PM, Nadav Har'El wrote:

This patch solves two outstanding nested-VMX issues:




Sorry, I missed an important point on the first review.


--- .before/arch/x86/kvm/vmx.c  2011-09-21 13:45:59.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-09-21 13:45:59.0 +0300
@@ -3858,12 +3858,17 @@ static bool nested_exit_on_intr(struct k
  static void enable_irq_window(struct kvm_vcpu *vcpu)
  {
u32 cpu_based_vm_exec_control;
-   if (is_guest_mode(vcpu)&&  nested_exit_on_intr(vcpu))
-   /* We can get here when nested_run_pending caused
-* vmx_interrupt_allowed() to return false. In this case, do
-* nothing - the interrupt will be injected later.
+   if (is_guest_mode(vcpu)&&  nested_exit_on_intr(vcpu)) {
+   /*
+* We get here if vmx_interrupt_allowed() returned 0 because
+* we must enter L2 now, so we can't inject to L1 now. If we
+* just do nothing, L2 will later exit and we can inject the
+* IRQ to L1 then. But to make L2 exit more promptly, we send
+* a self-IPI, causing L2 to exit right after entry.
 */
+   smp_send_reschedule(vcpu->cpu);
return;
+   }


->enable_irq_window() is called with interrupts enabled, so the self-IPI 
will be dispatched immediately and be lost.


The way to handle it is to kvm_make_request(KVM_REQ_IMMEDIATE_EXIT) 
here, and to add code to vcpu_enter_guest() to sample this into a local 
variable, and, after disabling interrupts, do the self-IPI.


A unit test would have caught this...

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-21 Thread Michael S. Tsirkin
On Wed, Sep 21, 2011 at 01:39:22AM -0400, Amos Kong wrote:
> - Original Message -
> > On Tue, Sep 20, 2011 at 06:45:57AM -0400, Amos Kong wrote:
> > > From 48ea1c9188334b89a60b4f9e853e86fc04fda4a5 Mon Sep 17 00:00:00
> > > 2001
> > > From: Amos Kong 
> > > Date: Tue, 20 Sep 2011 15:38:43 +0800
> > > Subject: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI
> > > DSDT tables
> > > 
> > > Only func 0 is registered to guest driver (we can
> > > only found func 0 in slot->funcs list of driver),
> > > the other functions could not be cleaned when
> > > hot-removing the whole slot. This patch adds
> > > device per function in ACPI DSDT tables.
> > > 
> > > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > > single/multiple function device, they are all fine.
> > > ---
> > > Changes from v1:
> > > - cleanup the macros, bios.bin gets back to 128K
> > > - notify only when func0 is added and removed
> > 
> > How about moving code into functions so that it isn't duplicated for
> > each PCI device.  See the patch below as an example (100% untested).

Hmm, I sent patches that did a similar thing but
in a slightly more compact way.
Message ids:
20110919092932.gb4...@redhat.com
20110919093644.gc4...@redhat.com
20110919100434.ga6...@redhat.com

Did they not reach you or something's wrong with them?

> 
> Tested, it works as my original patch-v2.
> 
> > The CPU hotplug stuff works this way, except it run-time generates
> > the
> > equivalent of "Device(S???)" and "PCNT".  (It may make sense to
> > run-time generate the PCI hotplug as well - it consumes about 10K of
> > space to statically generate the 31 devices.)
> 
> I'm ok with the new version.
> 
> Acked-by: Amos Kong 
> 
> > -Kevin
> > 
> > 
> > diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> > index 08412e2..31ac5eb 100644
> > --- a/src/acpi-dsdt.dsl
> > +++ b/src/acpi-dsdt.dsl
> > @@ -128,48 +128,6 @@ DefinitionBlock (
> >  PCRM, 32,
> >  }
> >  
> > -#define hotplug_slot(name, nr) \
> > -Device (S##name) {\
> > -   Name (_ADR, nr##)  \
> > -   Method (_EJ0,1) {  \
> > -Store(ShiftLeft(1, nr), B0EJ) \
> > -Return (0x0)  \
> > -   }  \
> > -   Name (_SUN, name)  \
> > -}
> > -
> > -   hotplug_slot(1, 0x0001)
> > -   hotplug_slot(2, 0x0002)
> > -   hotplug_slot(3, 0x0003)
> > -   hotplug_slot(4, 0x0004)
> > -   hotplug_slot(5, 0x0005)
> > -   hotplug_slot(6, 0x0006)
> > -   hotplug_slot(7, 0x0007)
> > -   hotplug_slot(8, 0x0008)
> > -   hotplug_slot(9, 0x0009)
> > -   hotplug_slot(10, 0x000a)
> > -   hotplug_slot(11, 0x000b)
> > -   hotplug_slot(12, 0x000c)
> > -   hotplug_slot(13, 0x000d)
> > -   hotplug_slot(14, 0x000e)
> > -   hotplug_slot(15, 0x000f)
> > -   hotplug_slot(16, 0x0010)
> > -   hotplug_slot(17, 0x0011)
> > -   hotplug_slot(18, 0x0012)
> > -   hotplug_slot(19, 0x0013)
> > -   hotplug_slot(20, 0x0014)
> > -   hotplug_slot(21, 0x0015)
> > -   hotplug_slot(22, 0x0016)
> > -   hotplug_slot(23, 0x0017)
> > -   hotplug_slot(24, 0x0018)
> > -   hotplug_slot(25, 0x0019)
> > -   hotplug_slot(26, 0x001a)
> > -   hotplug_slot(27, 0x001b)
> > -   hotplug_slot(28, 0x001c)
> > -   hotplug_slot(29, 0x001d)
> > -   hotplug_slot(30, 0x001e)
> > -   hotplug_slot(31, 0x001f)
> > -
> >  Name (_CRS, ResourceTemplate ()
> >  {
> >  WordBusNumber (ResourceProducer, MinFixed, MaxFixed,
> >  PosDecode,
> > @@ -762,6 +720,119 @@ DefinitionBlock (
> >  Zero   /* reserved */
> >  })
> >  
> > +/* PCI hotplug */
> > +Scope(\_SB.PCI0) {
> > +/* Methods called by bulk generated PCI devices below */
> > +Method (PCEJ, 1, NotSerialized) {
> > +// _EJ0 method - eject callback
> > +Store(ShiftLeft(1, Arg0), B0EJ)
> > +Return (0x0)
> > +}
> > +
> > +/* Bulk generated PCI hotplug devices */
> > +#define hotplug_func(nr, fn)\
> > +Device (S##nr##fn) {\
> > +Name (_ADR, 0x##nr##000##fn)\
> > +Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
> > +Name (_SUN, 0x##nr) \
> > +}

The fundamental question is still why would
we have _EJ0 methods in functions >0 when they are
not individually hotpluggable.
I think only function 0 should have _EJ0.

> > +
> > +#define hotplug_slot(nr) \
> > +hotplug_func(nr, 0)  \
> > +hotplug_func(nr, 1)  \
> > +hotplug_func(nr, 2)  \
> > +hotplug_func(nr, 3)  \
> > +hotplug_func(nr, 4)  \
> > +hotplug_fun

[PATCH] nVMX: Fix warning-causing idt-vectoring-info behavior

2011-09-21 Thread Nadav Har'El
This patch solves two outstanding nested-VMX issues:

1. "unexpected, valid vectoring info" warnings appeared in L1.
These are fixed by correcting the emulation of concurrent L0->L1 and
L1->L2 injections.

2. When we must run L2 next (e.g., on L1's VMLAUNCH/VMRESUME), injection into
L1 was delayed for an unknown amount of time - until L2 exits.
We now force (using a self IPI) an exit immediately after entry to L2,
so that the injection into L1 happens promptly.

Some more details about these issues:

When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
nVMX patch 23, titled "Correct handling of interrupt injection".

Unfortunately, it is possible (though rare) that at this point there is valid
idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
and when L2 tried to run this interrupt's handler, it got a page fault - so
it returns the original interrupt vector in idt_vectoring_info. The problem
is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
like we wished to, because the VMX spec guarantees that idt_vectoring_info
and exit_reason_external_interrupt can never happen together. This is not
just specified in the spec - a KVM L1 actually prints a kernel warning
"unexpected, valid vectoring info" if we violate this guarantee, and some
users noticed these warnings in L1's logs.

In order to better emulate a processor, which would never return the external
interrupt and the idt-vectoring-info together, we need to separate the two
injection steps: First, complete L1's injection into L2 (i.e., enter L2,
injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
Most of this is already in the code - the only change we need is to remain
in L2 (and not exit to L1) in this case.

However, to ensure prompt injection to L1, instead of letting L2 run for
a while after entering, we can send a self-IPI, which will ensure that
L2 exits immediately after the (necessary) entry, so we can inject into L1
as soon as possible.

Note that we added this self-IPI not only in the idt-vectoring-info case
above, but in every case where we are forced to enter L2 despite wishing to
inject to L1. This includes the case when L1 just VMLAUNCH/VMRESUMEed L2.

Note how we test vmcs12->idt_vectoring_info_field; This isn't really the
vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
of this field. This was explained in patch 25 ("Correct handling of idt
vectoring info") of the original nVMX patch series.

Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
to Abel Gordon for helping me figure out the solution, and to Avi Kivity
for helping to improve it.

Signed-off-by: Nadav Har'El 
---
 arch/x86/kvm/vmx.c |   20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

--- .before/arch/x86/kvm/vmx.c  2011-09-21 13:45:59.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-09-21 13:45:59.0 +0300
@@ -3858,12 +3858,17 @@ static bool nested_exit_on_intr(struct k
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
u32 cpu_based_vm_exec_control;
-   if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
-   /* We can get here when nested_run_pending caused
-* vmx_interrupt_allowed() to return false. In this case, do
-* nothing - the interrupt will be injected later.
+   if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
+   /*
+* We get here if vmx_interrupt_allowed() returned 0 because
+* we must enter L2 now, so we can't inject to L1 now. If we
+* just do nothing, L2 will later exit and we can inject the
+* IRQ to L1 then. But to make L2 exit more promptly, we send
+* a self-IPI, causing L2 to exit right after entry.
 */
+   smp_send_reschedule(vcpu->cpu);
return;
+   }
 
cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
@@ -3990,11 +3995,12 @@ static void vmx_set_nmi_mask(struct kvm_
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
-   struct vmcs12 *vmcs12;
-   if (to_vmx(vcpu)->nested.nested_run_pending)
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+   if (to_vmx(vcpu)->nested.nested_run_pending ||
+   (vmcs12->idt_vectoring_info_field &
+VECTORING_INFO_VALID_MASK))
return 0;
nested_vmx_vmexit(vcpu);
-   vmcs12 = get_vmcs12(vcpu);
vmcs12-

Using vmdk image with qemu-kvm

2011-09-21 Thread fuzzy_4711
Hi,

I am new to the list, so please guide me to the correct place to ask if
my issue is misplaced here.


My system is an openSuSE 11.4 box with
kvm-0.14.0.0-1.10.1.x86_64
libvirt-client-0.8.8-0.12.1.x86_64


libvirt-devel-0.8.8-0.12.1.x86_64


virt-manager-0.8.5-8.1.x86_64



   libvirt-0.8.8-0.12.1.x86_64


libvirt-doc-0.8.8-0.12.1.x86_64




libvirt-python-0.8.8-0.12.1.x86_64


virt-utils-1.1.5-1.2.1.x86_64

installed on it. Additional configuration info will be provided, if needed.

I have to run a vmware virtual machine, which runs smoothly under win xp
using vmware player (created and first run using vmware server), on my
linux box, using qemu-kvm.

This is the content of the vmx file:



.encoding = "windows-1252"
config.version = "8"
virtualHW.version = "4"
scsi0.present = "TRUE"
memsize = "512"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "Windows2000AdvancedServer.vmdk"
ide1:0.present = "TRUE"
ide1:0.fileName = "D:"
ide1:0.deviceType = "atapi-cdrom"
floppy0.fileName = "A:"
Ethernet0.present = "TRUE"
displayName = "Win2K Adv. Server"
guestOS = "win2000advserv"
priority.grabbed = "normal"
priority.ungrabbed = "normal"

scsi0:0.redo = ""
ethernet0.addressType = "generated"
uuid.location = "56 4d 5d 73 ac 49 1e a6-db 7f 09 c7 02 13 35 63"
uuid.bios = "56 4d 5d 73 ac 49 1e a6-db 7f 09 c7 02 13 35 63"
ide1:0.autodetect = "FALSE"
ethernet0.generatedAddress = "00:0c:29:13:35:63"
ethernet0.generatedAddressOffset = "0"

annotation = "for this virtual machine"

ide1:0.startConnected = "TRUE"
tools.syncTime = "TRUE"

extendedConfigFile = "Windows 2000 Advanced Server.vmxf"
virtualHW.productCompatibility = "hosted"
tools.upgrade.policy = "manual"

vmotion.checkpointFBSize = "16777216"

ide1:0.allowGuestConnectionControl = "FALSE"

usb.present = "TRUE"

usb.autoConnect.device0 = ""

usb.autoConnect.device1 = ""
usb.autoConnect.device2 = ""
checkpoint.vmState = ""
cleanShutdown = "TRUE"
replay.supported = "FALSE"
replay.filename = ""



When I use the vm-install script provided by SUSE, choosing the disk
like provided in the vmx file for use with kvm, in the pre flight check
it complains about "no valid boot sector found. Installation may have
failed".

I am not an expert on virtualisation as you can see. It is important to
me to be able to rerun this image in vmware player again if needed. So I
guess converting it is not an option afaics. Could you help?

Thanks.
fuz

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-21 Thread Avi Kivity

On 09/20/2011 08:28 PM, Avi Kivity wrote:

On 09/20/2011 07:30 PM, Marcelo Tosatti wrote:

> >
> >>   We do have a small issue.  If we exit during 
NMI-blocked-by-STI and
> >>   nmi_pending == 2, then we lose the second interrupt.  Should 
rarely
> >>   happen, since external interrupts never exit in that 
condition, but

> >>   it's a wart.
> >
> >And the above system reset case, you should be able to handle it by
> >saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).
>
>  We could just add a KVM_CAP (and flag) that extends nmi_pending from
>  a bool to a counter.

Or just add a new field to the pad.



Okay; I'll address this in a follow-on patch (my preference is making 
nmi_pending a counter).




Yet another way to do this is to redefine .injected (just in the API) to 
mean: inject immediately, unless blocked by interrupt shadow; in this 
case inject in the next instruction.  No KVM_CAP or anything.


The drawback is that if we hit the corner case of two NMIs queued and 
held back by interrupt shadow, an older kvm live migration target will 
not run the guest (exit with invalid state).  The advantage is that no 
user space or API changes are necessary.


Given that to get into this corner you need an NMI intensive load AND a 
sti; blah pair that spans two pages AND have the second page unavailable 
when those NMIs hit, I think it's better to avoid the API change.  Opinions?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-21 Thread Kevin Wolf
Am 20.09.2011 18:49, schrieb Jan Kiszka:
> Upstream's version is about to be signal-free and will stop handling
> SIGUSR2 specially. So it's time to adopt its implementation, ie. switch
> from signalfd to a pipe.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> This should help pulling upstream into qemu-kvm when "block: avoid
> SIGUSR2" is merged. And will help merging further cleanups of this code
> I'm working on.

Acked-by: Kevin Wolf 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-21 Thread Stefan Hajnoczi
On Tue, Sep 20, 2011 at 06:49:49PM +0200, Jan Kiszka wrote:
> Upstream's version is about to be signal-free and will stop handling
> SIGUSR2 specially. So it's time to adopt its implementation, ie. switch
> from signalfd to a pipe.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> This should help pulling upstream into qemu-kvm when "block: avoid
> SIGUSR2" is merged. And will help merging further cleanups of this code
> I'm working on.
> 
>  posix-aio-compat.c |   73 ++-
>  1 files changed, 32 insertions(+), 41 deletions(-)

Reviewed-by: Stefan Hajnoczi 

Perhaps qemu_eventfd() can be used in the future instead of an explicit
pipe.  Then Linux will do eventfd while other OSes will fall back to
pipes.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-21 Thread Zhi Yong Wu
On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti  wrote:
> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti  wrote:
>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti  
>> >> wrote:
>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> >> Note:
>> >> >>      1.) When bps/iops limits are specified to a small value such as 
>> >> >> 511 bytes/s, this VM will hang up. We are considering how to handle 
>> >> >> this senario.
>> >> >
>> >> > You can increase the length of the slice, if the request is larger than
>> >> > slice_time * bps_limit.
>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice 
>> >> idea?
>> >
>> > If the queue is empty, and the request being processed does not fit the
>> > queue, increase the slice so that the request fits.
>> Sorry for late reply. actually, do you think that this scenario is
>> meaningful for the user?
>> Since we implement this, if the user limits the bps below 512
>> bytes/second, the VM can also not run every task.
>> Can you let us know why we need to make such effort?
>
> It would be good to handle request larger than the slice.
Below is the code changes for your way. I used simple trace and did dd
test on guest, then found only the first rw req is handled, and
subsequent reqs are enqueued. After several minutes, guest prints the
info below on its terminal:
INFO: task kdmflush:326 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I don't make sure if it is correct. Do you have better way to verify it?

>
> It is not strictly necessary, but in case its not handled, a minimum
> should be in place, to reflect maximum request size known. Being able to
> specify something which crashes is not acceptable.
>
>

diff --git a/block.c b/block.c
index af19784..f88c22a 100644
--- a/block.c
+++ b/block.c
@@ -132,9 +132,10 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 bs->block_timer = NULL;
 }

-bs->slice_start = 0;
-
-bs->slice_end   = 0;
+bs->slice_time= 0;
+bs->slice_start   = 0;
+bs->slice_end = 0;
+bs->first_time_rw = false;
 }

 static void bdrv_block_timer(void *opaque)
@@ -151,9 +152,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
 bs->block_queue = qemu_new_block_queue();
 bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);

+bs->slice_time  = BLOCK_IO_SLICE_TIME;
 bs->slice_start = qemu_get_clock_ns(vm_clock);
-
-bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
+bs->slice_end   = bs->slice_start + bs->slice_time;
+bs->first_time_rw = true;
 }

 bool bdrv_io_limits_enabled(BlockDriverState *bs)
@@ -2846,11 +2848,23 @@ static bool
bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
 /* Calc approx time to dispatch */
 wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;

-if (wait) {
-*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-}
+if (!bs->first_time_rw
+|| !qemu_block_queue_is_empty(bs->block_queue)) {
+if (wait) {
+*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+}

-return true;
+return true;
+} else {
+bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+bs->slice_end += bs->slice_time - BLOCK_IO_SLICE_TIME;
+if (wait) {
+*wait = 0;
+}
+
+bs->first_time_rw = false;
+return false;
+}
 }

 static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
@@ -2895,11 +2909,23 @@ static bool
bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
 wait_time = 0;
 }

-if (wait) {
-*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-}
+if (!bs->first_time_rw
+|| !qemu_block_queue_is_empty(bs->block_queue)) {
+if (wait) {
+*wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+}

-return true;
+return true;
+} else {
+bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+bs->slice_end += bs->slice_time - BLOCK_IO_SLICE_TIME;
+if (wait) {
+*wait = 0;
+}
+
+bs->first_time_rw = false;
+return false;
+}
 }

 static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
@@ -2912,10 +2938,10 @@ static bool
bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
 now = qemu_get_clock_ns(vm_clock);
 if ((bs->slice_start < now)
 && (bs->slice_end > now)) {
-bs->slice_end = now + BLOCK_IO_SLICE_TIME;
+bs->slice_end = now + bs->slice_time;
 } else {
 bs->slice_start = now;
-bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
+bs->slice_end   = now + bs->slice_time;

 bs->io_disps.bytes[is_write]  = 0;
 bs->io_disps.bytes[!is_write] = 0;
diff