[PATCH] VT-d: Support multiple device assignment to one guest

2008-09-27 Thread Han, Weidong
From f2f722515135d95016f2d2ab55cc2aaf23d2fd80 Mon Sep 17 00:00:00 2001
From: Weidong Han [EMAIL PROTECTED]
Date: Sat, 27 Sep 2008 14:28:07 +0800
Subject: [PATCH] Support multiple device assignment to one guest

Current VT-d patches in kvm only support one device assignment to one
guest due to dmar_domain is per device.

In order to support multiple device assignemnt, this patch wraps
dmar_domain with a reference count (kvm_vtd_domain), and also adds a
pointer in kvm_assigned_dev_kernel to link to a kvm_vtd_domain.

Each dmar_domain owns one VT-d page table, in order to reduce page
tables and improve IOTLB utility, the devices assigned to the same guest
and under the same IOMMU share the same kvm_vtd_domain.

Signed-off-by: Weidong Han [EMAIL PROTECTED]
---
 arch/x86/kvm/vtd.c  |  196
+++
 arch/x86/kvm/x86.c  |   22 --
 drivers/pci/intel-iommu.c   |   16 
 include/asm-x86/kvm_host.h  |1 -
 include/linux/intel-iommu.h |1 +
 include/linux/kvm_host.h|   21 +
 6 files changed, 176 insertions(+), 81 deletions(-)

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index 667bf3f..b9c52c4 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -27,19 +27,40 @@
 #include linux/dmar.h
 #include linux/intel-iommu.h
 
-static int kvm_iommu_unmap_memslots(struct kvm *kvm);
+static void kvm_iommu_put_domain_pages(struct dmar_domain *domain,
+   gfn_t base_gfn, unsigned long
npages)
+{
+gfn_t gfn = base_gfn;
+pfn_t pfn;
+int i;
+
+for (i = 0; i  npages; i++) {
+pfn = (pfn_t)intel_iommu_iova_to_pfn(domain,
+ gfn_to_gpa(gfn));
+kvm_release_pfn_clean(pfn);
+gfn++;
+}
+}
+
 static void kvm_iommu_put_pages(struct kvm *kvm,
-   gfn_t base_gfn, unsigned long npages);
+gfn_t base_gfn, unsigned long npages)
+{
+struct kvm_assigned_dev_kernel *assigned_dev;
 
-int kvm_iommu_map_pages(struct kvm *kvm,
-   gfn_t base_gfn, unsigned long npages)
+list_for_each_entry(assigned_dev, kvm-arch.assigned_dev_head,
list) {
+
kvm_iommu_put_domain_pages(assigned_dev-vtd_domain-domain,
+   base_gfn, npages);
+}
+}
+
+static int kvm_iommu_map_domain_pages(struct kvm *kvm,
+  struct dmar_domain *domain,
+  gfn_t base_gfn, unsigned long
npages)
 {
gfn_t gfn = base_gfn;
pfn_t pfn;
int i, r;
-   struct dmar_domain *domain = kvm-arch.intel_iommu_domain;
 
-   /* check if iommu exists and in use */
if (!domain)
return 0;
 
@@ -74,18 +95,40 @@ int kvm_iommu_map_pages(struct kvm *kvm,
return 0;
 
 unmap_pages:
-   kvm_iommu_put_pages(kvm, base_gfn, i);
+   kvm_iommu_put_domain_pages(domain, base_gfn, i);
return r;
 }
 
-static int kvm_iommu_map_memslots(struct kvm *kvm)
+int kvm_iommu_map_pages(struct kvm *kvm,
+   gfn_t base_gfn, unsigned long npages)
+{
+   int r = 0;
+struct kvm_assigned_dev_kernel *assigned_dev;
+
+list_for_each_entry(assigned_dev, kvm-arch.assigned_dev_head,
list) {
+   r = kvm_iommu_map_domain_pages(kvm,
+   assigned_dev-vtd_domain-domain,
+   base_gfn, npages);
+   if (r)
+   goto unmap_pages;
+   }
+
+   return 0;
+
+unmap_pages:
+   kvm_iommu_put_pages(kvm, base_gfn, npages);
+   return r;
+}
+
+static int kvm_iommu_map_domain_memslots(struct kvm *kvm,
+struct dmar_domain *domain)
 {
int i, r;
 
down_read(kvm-slots_lock);
for (i = 0; i  kvm-nmemslots; i++) {
-   r = kvm_iommu_map_pages(kvm, kvm-memslots[i].base_gfn,
-   kvm-memslots[i].npages);
+   r = kvm_iommu_map_domain_pages(kvm, domain,
+kvm-memslots[i].base_gfn,
kvm-memslots[i].npages);
if (r)
break;
}
@@ -93,10 +136,23 @@ static int kvm_iommu_map_memslots(struct kvm *kvm)
return r;
 }
 
+static void kvm_iommu_unmap_domain_memslots(struct kvm *kvm,
+   struct dmar_domain *domain)
+{
+   int i;
+   down_read(kvm-slots_lock);
+   for (i = 0; i  kvm-nmemslots; i++) {
+   kvm_iommu_put_domain_pages(domain,
+   kvm-memslots[i].base_gfn,
kvm-memslots[i].npages);
+   }
+   up_read(kvm-slots_lock);
+}
+
 int kvm_iommu_map_guest(struct kvm *kvm,
struct kvm_assigned_dev_kernel *assigned_dev)
 {
struct pci_dev *pdev = NULL;
+   struct kvm_vtd_domain *vtd_dom = NULL;

RE: Status of pci passthrough work?

2008-09-27 Thread Han, Weidong
Hi Thomas,

the patches of passthrough/VT-d on kvm.git are already checked in. With
Amit's userspace patches, you can assign device to guest. You can have a
try.

Randy (Weidong)

Thomas Fjellstrom wrote:
 I'm very interested in being able to pass a few devices through to
 kvm guests. I'm wondering what exactly is working now, and how I can
 start testing it? 
 
 the latest kvm release doesn't seem to include any support for it in
 userspace, so I can't test it with that...
 
 Basically what I want to do is assign a two or three physical nics
 (100mb and GiB) to one vm, some tv tuner cards to another.
 
 Also, I'm wondering if AMD's iommu in the SB750 southbridge is
 supported yet? Or if anyone is working on it?
 
 --
 Thomas Fjellstrom
 [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Thomas Fjellstrom
On Saturday 27 September 2008, Han, Weidong wrote:
 Hi Thomas,

 the patches of passthrough/VT-d on kvm.git are already checked in. With
 Amit's userspace patches, you can assign device to guest. You can have a
 try.

Does that mean I need VT-d support in hardware? All I have to test with right 
now is an AMD Phenom X4  with a 780g+sb700 system. Don't think it has an 
iommu, and I'd find it odd if the intel VT-d code just worked on amd's 
hardware.

 Randy (Weidong)

 Thomas Fjellstrom wrote:
  I'm very interested in being able to pass a few devices through to
  kvm guests. I'm wondering what exactly is working now, and how I can
  start testing it?
 
  the latest kvm release doesn't seem to include any support for it in
  userspace, so I can't test it with that...
 
  Basically what I want to do is assign a two or three physical nics
  (100mb and GiB) to one vm, some tv tuner cards to another.
 
  Also, I'm wondering if AMD's iommu in the SB750 southbridge is
  supported yet? Or if anyone is working on it?
 
  --
  Thomas Fjellstrom
  [EMAIL PROTECTED]

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Thomas Fjellstrom
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Status of pci passthrough work?

2008-09-27 Thread Han, Weidong
Thomas Fjellstrom wrote:
 On Saturday 27 September 2008, Han, Weidong wrote:
 Hi Thomas,
 
 the patches of passthrough/VT-d on kvm.git are already checked in.
 With Amit's userspace patches, you can assign device to guest. You
 can have a try.
 
 Does that mean I need VT-d support in hardware? All I have to test
 with right now is an AMD Phenom X4  with a 780g+sb700 system. Don't
 think it has an iommu, and I'd find it odd if the intel VT-d code
 just worked on amd's hardware.
 

Yes, currently you need VT-d support in hardware to assign device. 

Randy (Weidong)

 Randy (Weidong)
 
 Thomas Fjellstrom wrote:
 I'm very interested in being able to pass a few devices through to
 kvm guests. I'm wondering what exactly is working now, and how I
 can start testing it? 
 
 the latest kvm release doesn't seem to include any support for it in
 userspace, so I can't test it with that...
 
 Basically what I want to do is assign a two or three physical nics
 (100mb and GiB) to one vm, some tv tuner cards to another.
 
 Also, I'm wondering if AMD's iommu in the SB750 southbridge is
 supported yet? Or if anyone is working on it?
 
 --
 Thomas Fjellstrom
 [EMAIL PROTECTED]
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 --
 Thomas Fjellstrom
 [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Thomas Fjellstrom
On Saturday 27 September 2008, Han, Weidong wrote:
 Thomas Fjellstrom wrote:
  On Saturday 27 September 2008, Han, Weidong wrote:
  Hi Thomas,
 
  the patches of passthrough/VT-d on kvm.git are already checked in.
  With Amit's userspace patches, you can assign device to guest. You
  can have a try.
 
  Does that mean I need VT-d support in hardware? All I have to test
  with right now is an AMD Phenom X4  with a 780g+sb700 system. Don't
  think it has an iommu, and I'd find it odd if the intel VT-d code
  just worked on amd's hardware.

 Yes, currently you need VT-d support in hardware to assign device.

So I take it the PV-DMA (or pv-dma doesn't do what I think it does...) or the 
other 1:1 device pass through work isn't working right now?

It's something I'd really like to use, but I don't have access to a platform 
with a hardware iommu. Though I might be able to pick up a replacement board 
for my new server with the SB750 southbridge which supposedly has AMD's new 
iommu hardware in it, but I haven't seen any evidence that kvm or linux 
supports it.

 Randy (Weidong)

  Randy (Weidong)
 
  Thomas Fjellstrom wrote:
  I'm very interested in being able to pass a few devices through to
  kvm guests. I'm wondering what exactly is working now, and how I
  can start testing it?
 
  the latest kvm release doesn't seem to include any support for it in
  userspace, so I can't test it with that...
 
  Basically what I want to do is assign a two or three physical nics
  (100mb and GiB) to one vm, some tv tuner cards to another.
 
  Also, I'm wondering if AMD's iommu in the SB750 southbridge is
  supported yet? Or if anyone is working on it?
 
  --
  Thomas Fjellstrom
  [EMAIL PROTECTED]
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
  --
  Thomas Fjellstrom
  [EMAIL PROTECTED]


-- 
Thomas Fjellstrom
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6 v3] PCI: Linux kernel SR-IOV support

2008-09-27 Thread Zhao, Yu
Greetings,

Following patches are intended to support SR-IOV capability in the Linux 
kernel. With these patches, people can turn a PCI device with the capability 
into multiple ones from software perspective, which can benefit KVM and achieve 
other purposes such as QoS, security, etc.

[PATCH 1/6 v3] PCI: export some functions and macros
[PATCH 2/6 v3] PCI: add new general functions
[PATCH 3/6 v3] PCI: support ARI capability
[PATCH 4/6 v3] PCI: support SR-IOV capability
[PATCH 5/6 v3] PCI: reserve bus range for SR-IOV device
[PATCH 6/6 v3] PCI: document the change

 b/Documentation/DocBook/kernel-api.tmpl |2
 b/Documentation/PCI/pci-iov-howto.txt   |  228 
 b/drivers/pci/Kconfig   |   12
 b/drivers/pci/Makefile  |2
 b/drivers/pci/iov.c |  850 
 b/drivers/pci/pci-sysfs.c   |   10
 b/drivers/pci/pci.c |2
 b/drivers/pci/pci.h |   20
 b/drivers/pci/probe.c   |   50 -
 b/drivers/pci/proc.c|7
 b/drivers/pci/setup-bus.c   |4
 b/drivers/pci/setup-res.c   |2
 b/include/linux/pci.h   |5
 b/include/linux/pci_regs.h  |   15
 drivers/pci/iov.c   |   24
 drivers/pci/pci-sysfs.c |  123 ++--
 drivers/pci/pci.c   |  113 +++-
 drivers/pci/pci.h   |   75 ++
 drivers/pci/probe.c |   40 +
 drivers/pci/setup-bus.c |4
 drivers/pci/setup-res.c |   27 -
 include/linux/pci.h |   98 +++
 include/linux/pci_regs.h|   22
 23 files changed, 1580 insertions(+), 155 deletions(-)

---

Single Root I/O Virtualization (SR-IOV) capability defined by PCI-SIG is 
intended to enable multiple system software to share PCI hardware resources. 
PCI device that supports this capability can be extended to one Physical 
Functions plus multiple Virtual Functions. Physical Function, which could be 
considered as the real PCI device, reflects the hardware instance and manages 
all physical resources. Virtual Functions are associated with a Physical 
Function and shares physical resources with the Physical Function. Software can 
control allocation of Virtual Functions via registers encapsulated in the 
capability structure.

SR-IOV specification can be found at 
http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf

Devices that support SR-IOV are available from following vendors:
http://download.intel.com/design/network/ProdBrf/320025.pdf
http://www.netxen.com/products/chipsolutions/NX3031.html
http://www.neterion.com/products/x3100.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6 v3] PCI: export some functions and macros

2008-09-27 Thread Zhao, Yu
Export some functions and move some macros from c file to header file.

Cc: Jesse Barnes [EMAIL PROTECTED]
Cc: Randy Dunlap [EMAIL PROTECTED]
Cc: Grant Grundler [EMAIL PROTECTED]
Cc: Alex Chiang [EMAIL PROTECTED]
Cc: Matthew Wilcox [EMAIL PROTECTED]
Cc: Roland Dreier [EMAIL PROTECTED]
Cc: Greg KH [EMAIL PROTECTED]
Signed-off-by: Yu Zhao [EMAIL PROTECTED]

---
 drivers/pci/pci-sysfs.c |   10 
 drivers/pci/pci.c   |2 +-
 drivers/pci/pci.h   |   20 ++
 drivers/pci/probe.c |   50 +-
 drivers/pci/setup-bus.c |4 +-
 drivers/pci/setup-res.c |2 +-
 include/linux/pci.h |4 +-
 7 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c71858..f99160d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -696,7 +696,7 @@ static struct bin_attribute pci_config_attr = {
.name = config,
.mode = S_IRUGO | S_IWUSR,
},
-   .size = 256,
+   .size = PCI_CFG_SPACE_SIZE,
.read = pci_read_config,
.write = pci_write_config,
 };
@@ -706,7 +706,7 @@ static struct bin_attribute pcie_config_attr = {
.name = config,
.mode = S_IRUGO | S_IWUSR,
},
-   .size = 4096,
+   .size = PCI_CFG_SPACE_EXP_SIZE,
.read = pci_read_config,
.write = pci_write_config,
 };
@@ -724,7 +724,7 @@ int __must_check pci_create_sysfs_dev_files (struct pci_dev 
*pdev)
if (!sysfs_initialized)
return -EACCES;

-   if (pdev-cfg_size  4096)
+   if (pdev-cfg_size  PCI_CFG_SPACE_EXP_SIZE)
retval = sysfs_create_bin_file(pdev-dev.kobj, 
pci_config_attr);
else
retval = sysfs_create_bin_file(pdev-dev.kobj, 
pcie_config_attr);
@@ -795,7 +795,7 @@ err_vpd:
kfree(pdev-vpd-attr);
}
 err_config_file:
-   if (pdev-cfg_size  4096)
+   if (pdev-cfg_size  PCI_CFG_SPACE_EXP_SIZE)
sysfs_remove_bin_file(pdev-dev.kobj, pci_config_attr);
else
sysfs_remove_bin_file(pdev-dev.kobj, pcie_config_attr);
@@ -820,7 +820,7 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev)
sysfs_remove_bin_file(pdev-dev.kobj, pdev-vpd-attr);
kfree(pdev-vpd-attr);
}
-   if (pdev-cfg_size  4096)
+   if (pdev-cfg_size  PCI_CFG_SPACE_EXP_SIZE)
sysfs_remove_bin_file(pdev-dev.kobj, pci_config_attr);
else
sysfs_remove_bin_file(pdev-dev.kobj, pcie_config_attr);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c9884bb..259eaff 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -216,7 +216,7 @@ int pci_find_ext_capability(struct pci_dev *dev, int cap)
int ttl = 480; /* 3840 bytes, minimum 8 bytes per capability */
int pos = 0x100;

-   if (dev-cfg_size = 256)
+   if (dev-cfg_size = PCI_CFG_SPACE_SIZE)
return 0;

if (pci_read_config_dword(dev, pos, header) != PCIBIOS_SUCCESSFUL)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d807cd7..596efa6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1,3 +1,9 @@
+#ifndef DRIVERS_PCI_H
+#define DRIVERS_PCI_H
+
+#define PCI_CFG_SPACE_SIZE 256
+#define PCI_CFG_SPACE_EXP_SIZE 4096
+
 /* Functions internal to the PCI core code */

 extern int pci_uevent(struct device *dev, struct kobj_uevent_env *env);
@@ -144,3 +150,17 @@ struct pci_slot_attribute {
 };
 #define to_pci_slot_attr(s) container_of(s, struct pci_slot_attribute, attr)

+enum pci_bar_type {
+   pci_bar_unknown,/* Standard PCI BAR probe */
+   pci_bar_io, /* An io port BAR */
+   pci_bar_mem32,  /* A 32-bit memory BAR */
+   pci_bar_mem64,  /* A 64-bit memory BAR */
+   pci_bar_rom,/* A ROM BAR */
+};
+
+extern int pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
+   struct resource *res, unsigned int reg);
+extern struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
+   struct pci_dev *bridge, int busnr);
+
+#endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 36698e5..7cdb834 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -14,8 +14,6 @@

 #define CARDBUS_LATENCY_TIMER  176 /* secondary latency timer */
 #define CARDBUS_RESERVE_BUSNR  3
-#define PCI_CFG_SPACE_SIZE 256
-#define PCI_CFG_SPACE_EXP_SIZE 4096

 /* Ugh.  Need to stop exporting this to modules. */
 LIST_HEAD(pci_root_buses);
@@ -203,13 +201,6 @@ static u64 pci_size(u64 base, u64 maxbase, u64 mask)
return size;
 }

-enum pci_bar_type {
-   pci_bar_unknown,/* Standard PCI BAR probe */
-   pci_bar_io, /* An io port BAR */
-   pci_bar_mem32,  /* A 32-bit memory BAR */
-   pci_bar_mem64,  /* A 

[PATCH 4/6 v3] PCI: support SR-IOV capability

2008-09-27 Thread Zhao, Yu
Add Single Root I/O Virtualization (SR-IOV) support.

Cc: Jesse Barnes [EMAIL PROTECTED]
Cc: Randy Dunlap [EMAIL PROTECTED]
Cc: Grant Grundler [EMAIL PROTECTED]
Cc: Alex Chiang [EMAIL PROTECTED]
Cc: Matthew Wilcox [EMAIL PROTECTED]
Cc: Roland Dreier [EMAIL PROTECTED]
Cc: Greg KH [EMAIL PROTECTED]
Signed-off-by: Yu Zhao [EMAIL PROTECTED]

---
 drivers/pci/Kconfig  |   12 +
 drivers/pci/Makefile |2 +
 drivers/pci/iov.c|  850 ++
 drivers/pci/pci-sysfs.c  |4 +
 drivers/pci/pci.c|   14 +-
 drivers/pci/pci.h|   55 +++
 drivers/pci/probe.c  |4 +
 include/linux/pci.h  |   57 +++
 include/linux/pci_regs.h |   21 ++
 9 files changed, 1018 insertions(+), 1 deletions(-)
 create mode 100644 drivers/pci/iov.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index e1ca425..e7c0836 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -50,3 +50,15 @@ config HT_IRQ
   This allows native hypertransport devices to use interrupts.

   If unsure say Y.
+
+config PCI_IOV
+   bool PCI SR-IOV support
+   depends on PCI
+   select PCI_MSI
+   default n
+   help
+ This option allows device drivers to enable Single Root I/O
+ Virtualization. Each Virtual Function's PCI configuration
+ space can be accessed using its own Bus, Device and Function
+ Number (Routing ID). Each Virtual Function also has PCI Memory
+ Space, which is used to map its own register set.
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 7d63f8c..47bb456 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
 ifeq ($(CONFIG_PCI_DEBUG),y)
 EXTRA_CFLAGS += -DDEBUG
 endif
+
+obj-$(CONFIG_PCI_IOV) += iov.o
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
new file mode 100644
index 000..a2b2de9
--- /dev/null
+++ b/drivers/pci/iov.c
@@ -0,0 +1,850 @@
+/*
+ * drivers/pci/iov.c
+ *
+ * Copyright (C) 2008 Intel Corporation
+ *
+ * PCI Express Single Root I/O Virtualization capability support.
+ */
+
+#include linux/ctype.h
+#include linux/string.h
+#include linux/pci.h
+#include linux/delay.h
+#include asm/page.h
+#include pci.h
+
+#define VF_NAME_LEN8
+
+
+struct iov_attr {
+   struct attribute attr;
+   ssize_t (*show)(struct kobject *,
+   struct iov_attr *, char *);
+   ssize_t (*store)(struct kobject *,
+   struct iov_attr *, const char *, size_t);
+};
+
+#define iov_config_attr(field) \
+static ssize_t field##_show(struct kobject *kobj,  \
+   struct iov_attr *attr, char *buf)   \
+{  \
+   struct pci_iov *iov = container_of(kobj, struct pci_iov, kobj); \
+   \
+   return sprintf(buf, %d\n, iov-field);\
+}
+
+iov_config_attr(is_enabled);
+iov_config_attr(totalvfs);
+iov_config_attr(initialvfs);
+iov_config_attr(numvfs);
+
+struct vf_entry {
+   int vfn;
+   struct kobject kobj;
+   struct pci_iov *iov;
+   struct iov_attr *attr;
+   char name[VF_NAME_LEN];
+   char (*param)[PCI_IOV_PARAM_LEN];
+};
+
+static ssize_t iov_attr_show(struct kobject *kobj,
+   struct attribute *attr, char *buf)
+{
+   struct iov_attr *ia = container_of(attr, struct iov_attr, attr);
+
+   return ia-show ? ia-show(kobj, ia, buf) : -EIO;
+}
+
+static ssize_t iov_attr_store(struct kobject *kobj,
+   struct attribute *attr, const char *buf, size_t len)
+{
+   struct iov_attr *ia = container_of(attr, struct iov_attr, attr);
+
+   return ia-store ? ia-store(kobj, ia, buf, len) : -EIO;
+}
+
+static struct sysfs_ops iov_attr_ops = {
+   .show = iov_attr_show,
+   .store = iov_attr_store,
+};
+
+static struct kobj_type iov_ktype = {
+   .sysfs_ops = iov_attr_ops,
+};
+
+static inline void vf_rid(struct pci_dev *dev, int vfn, u8 *busnr, u8 *devfn)
+{
+   u16 rid;
+
+   rid = (dev-bus-number  8) + dev-devfn +
+   dev-iov-offset + dev-iov-stride * vfn;
+   *busnr = rid  8;
+   *devfn = rid  0xff;
+}
+
+static int vf_add(struct pci_dev *dev, int vfn)
+{
+   int i;
+   int rc;
+   u8 busnr, devfn;
+   unsigned long size;
+   struct pci_dev *new;
+   struct pci_bus *bus;
+   struct resource *res;
+
+   vf_rid(dev, vfn, busnr, devfn);
+
+   new = alloc_pci_dev();
+   if (!new)
+   return -ENOMEM;
+
+   list_for_each_entry(bus, dev-bus-children, node)
+   if (bus-number == busnr) {
+   new-bus = bus;
+   break;
+   }
+
+   BUG_ON(!new-bus);
+   new-sysdata = bus-sysdata;
+   new-dev.parent = 

Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Yang, Sheng
On Wednesday 24 September 2008 16:38:35 Avi Kivity wrote:
 Yang, Sheng wrote:
  - Shared Interrupt support
 
  I still don't know who would do this. It's very important for VT-d real
  usable. If nobody interested in it, I would pick it up, but after Oct. 6
  (after National Holiday in China).

 Shared host interrupts?  What's your plan here?  The polarity trick?

Hi, Avi

After check host shared interrupts situation, I got a question here:

If I understand correctly, current solution don't block host shared irq, just 
come with the performance pentry. The penalty come with host disabled irq 
line for a period. We have to wait guest to write EOI. But I fail to see the 
correctness problem here (except a lot of spurious interrupt in the guest).

I've checked mail, but can't find clue about that. Can you explain the 
situation?

Thanks!
--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity

Yang, Sheng wrote:

After check host shared interrupts situation, I got a question here:

If I understand correctly, current solution don't block host shared irq, just 
come with the performance pentry. The penalty come with host disabled irq 
line for a period. We have to wait guest to write EOI. But I fail to see the 
correctness problem here (except a lot of spurious interrupt in the guest).


I've checked mail, but can't find clue about that. Can you explain the 
situation?


  


If the guest fails to disable interrupts on a device that shares an 
interrupt line with the host, the host will experience an interrupt 
flood.  Eventually the host will disable the host device as well.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-27 Thread Avi Kivity

Muli Ben-Yehuda wrote:

 


MMIO isn't just a register window.  It may be an on-device buffer.



Unlikely, but ok.

  


It's unlikely in the same ways graphics cards are unlikely :)

With a multi-card setup, perhaps it is even reasonable for one card to 
dma to another.



I strongly disagree. You are advocating something that is potentially
unsafe---for the sake of code simplicity?! I am advocating caution in
what we let an *untrusted* guest do.
  


Why would it be unsafe?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: MMU: fix largepage shadow accounting with oos

2008-09-27 Thread Avi Kivity

Marcelo Tosatti wrote:

There's no need to increase the largepage shadow count when syncing
since there's no count decrement on unsync, only on destruction.

  


Applied, thanks.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] stop passing in global variable as argument to cmos_init()

2008-09-27 Thread Avi Kivity

Jes Sorensen wrote:

Hi,

Looking through the ia64 code I came across this little gem.

At some point someone added a new argument to hw/pc.c:cmos_init() named
'smp_cpus', and then passed in the global variable 'smp_cpus' as the
argument. This propagated through to the ia64 code as well.

I checked, this isn't present in the upstream QEMU code, so lets kill
it in the KVM branch. One small step to get closer to upstream :-)



Applied, thanks (though personally I prefer not depending on global 
variables and their initialization order, etc.)


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] reindent ia64 code to match qemu code style

2008-09-27 Thread Avi Kivity

Jes Sorensen wrote:

Hi,

Xiantao and I have agreed to reformat the ia64 related code so it
better matches the QEMU formatting style.

This patch has zero code change, it is solely reformatting.

It goes on top of the cmos_init() tidyup patch I sent out earlier today.


Applied, thanks.  I got a reject in cpu.h which I didn't bother to fix, 
but nothing serious.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] VMX: work around lacking VNMI support

2008-09-27 Thread Avi Kivity

Jan Kiszka wrote:

As a workaround (or safety bag), is it imaginable to delay or deny VCPU
snapshots at not yet fully restorable points (like
GUEST_INTERRUPTIBILITY_INFO != 0)? Or stick-your-head-into-the-sand for now?

  


Head in sand.  The points where we have interrupt shadows should be 
extremely rare, and the points where it matters even rarer.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] Implement GIF, clgi and stgi v3

2008-09-27 Thread Avi Kivity

Joerg Roedel wrote:

I had another possible idea for performance improvement here. Since we
only inject normal interrupts and exceptions (and not NMI and such) we
can patch clgi to cli and stgi to sti to save these two intercepts in
the guests vmrun path.
Any objections/problems with this?

  


The sequence 'clgi; sti' will break.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] Implement GIF, clgi and stgi v3

2008-09-27 Thread Avi Kivity

Alexander Graf wrote:


Hmm yes, this is a problem. So this optimization will not work. We need
other ways to optimize :)


Well it would work for the KVM-in-KVM case, where we know that VMRUN 
is always triggered with IF=1 and V_INTR=1. The only case that hack 
fails is when we have IF=0 and V_INTR=1. Everything else should work 
just fine. And in this case we would simply issue some VMEXITs 0x60, 
so no big deal IMHO. It should be worth the tradeoff of making most 
VMMs a lot faster.


There should be a compile-option to enable the correct behavior 
though. If we join that with the VMLOAD and VMSAVE hack there would be 
only the VMRUN and DR exits left. That sounds like a really good 
improvement where I wouldn't mind to break some specs :-).


Maybe a hypercall, so it can be enabled on a guest-by-guest basis.

I must say that if we do guest-specific hacking this way, a paravirt 
approach doesn't look so bad.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] Add VMRUN handler v3

2008-09-27 Thread Avi Kivity

Alexander Graf wrote:


Is copying one page really that expensive? Is there any accelerated 
function available for that that copies it with SSE or so? :-)




'rep movs' is supposed to be accelerated, doing cacheline-by-cacheline 
copies (at least on Intel).


In any case the kernel memcpy() should pick the fastest method.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] Add VMRUN handler v3

2008-09-27 Thread Avi Kivity

[EMAIL PROTECTED] wrote:

Copying data in memory is always expensive because the accesses may miss
in the caches and data must be fetched from memory. As far as I know
this can be around 150 cycles per cache line.
  


When the copy is sequential, the processor will prefetch the data ahead 
of time, so you don't get the full hit.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v3] PCI: export some functions and macros

2008-09-27 Thread Matthew Wilcox
On Sat, Sep 27, 2008 at 04:27:44PM +0800, Zhao, Yu wrote:
 Export some functions and move some macros from c file to header file.

That's absolutely not everything this patch does.  You need to split
this into smaller pieces and explain what you're doing and why for each
of them.

 diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
 index d807cd7..596efa6 100644
 --- a/drivers/pci/pci.h
 +++ b/drivers/pci/pci.h
 @@ -1,3 +1,9 @@
 +#ifndef DRIVERS_PCI_H
 +#define DRIVERS_PCI_H

Do we really need header guards on this file?

 -/*
 - * If the type is not unknown, we assume that the lowest bit is 'enable'.
 - * Returns 1 if the BAR was 64-bit and 0 if it was 32-bit.
 +/**
 + * pci_read_base - read a PCI BAR
 + * @dev: the PCI device
 + * @type: type of the BAR
 + * @res: resource buffer to be filled in
 + * @pos: BAR position in the config space
 + *
 + * Returns 1 if the BAR is 64-bit, or 0 if 32-bit.
   */
 -static int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 +int pci_read_base(struct pci_dev *dev, enum pci_bar_type type,

The original intent here was to have a pci_read_base() that called
__pci_read_base() and then did things like translate physical BAR
addresses to virtual ones.  That patch is in the archives somewhere.
We ended up not including that patch because my user found out he could
get the address he wanted from elsewhere.  I'm not sure we want to
remove the __ at this point.

The eventual goal is to fix up the BARs at this point, but there's
several architectures that will break if we do this now.  It's on my
long-term todo list.

 struct resource *res, unsigned int pos)
  {
 u32 l, sz, mask;
 
 -   mask = type ? ~PCI_ROM_ADDRESS_ENABLE : ~0;
 +   mask = (type == pci_bar_rom) ? ~PCI_ROM_ADDRESS_ENABLE : ~0;

What's going on here?  Why are you adding pci_bar_rom?  For the rom we
use pci_bar_mem32.  Take a look at, for example, the MCHBAR in the 965
spec (313053.pdf).  That's something that uses the pci_bar_mem64 type
and definitely wants to use the PCI_ROM_ADDRESS_ENABLE mask.

 
 -   if (type == pci_bar_unknown) {
 +   if (type == pci_bar_rom) {
 +   res-flags |= (l  IORESOURCE_ROM_ENABLE);
 +   l = PCI_ROM_ADDRESS_MASK;
 +   mask = (u32)PCI_ROM_ADDRESS_MASK;
 +   } else {

This looks wrong too.

 if (rom) {
 @@ -344,7 +340,7 @@ static void pci_read_bases(struct pci_dev *dev, unsigned 
 int howmany, int rom)
 res-flags = IORESOURCE_MEM | IORESOURCE_PREFETCH |
 IORESOURCE_READONLY | IORESOURCE_CACHEABLE |
 IORESOURCE_SIZEALIGN;
 -   __pci_read_base(dev, pci_bar_mem32, res, rom);
 +   pci_read_base(dev, pci_bar_mem32, res, rom);
 }

And you don't even change the type here ... have you tested this code on
a system which has a ROM?

 
 -   for(i=0; i3; i++)
 -   child-resource[i] = dev-resource[PCI_BRIDGE_RESOURCES+i];
 -

Er, this is rather important.  Why can you just delete it?

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm: bios: switch MTRRs to cover only the PCI range and default to WB

2008-09-27 Thread Avi Kivity

Yang, Sheng wrote:

I think we should do a little more than just write msr to update mtrr.

Intel SDM 10.11.8 MTRR consideration in MP Systems define the procedure to 
modify MTRR msr in MP. Especially, step 4 enter no-fill cache mode(set CR0.CD 
bit and clean NW bit), step 12 re-enabled the caching(clear this two bits).


We based on these behaviors to detect MTRR update.

  


Why not simply flush the mmu on an mtrr write?

(though of course I have no objection to doing what the manual says)

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Jan C. Bernauer
Hi,

 I have about the same problem, so excuse me for hijacking this thread.

My hardware consists of a 780g/SB700 Mainboard and a 4850e AMD CPU, and
I'm interested in forwarding a DVB-C tuner card to the guest. Maybe
some NICs later.

I tried and 'sort of' got it working with Amit's kernel and userspace
tools.
First thing:
The dvb-c card has an interesting memory mapping, as reported by
lspci -v:
Memory at cfdff000 (32-bit, non-prefetchable) [size=512]

Size 512 doesn't fly with a check in kvm_main.c:
if (mem-memory_size  (PAGE_SIZE - 1))
goto out;

So I patched the userspace utilities to use 4096 instead.

With that patch, the guest saw the card, the driver got loaded,
and channel tuning works, but I get some i2c timeouts on the
guest side, and the host side has errors like:

[ cut here ]
Sep 22 02:28:54 [kernel] WARNING: at kernel/irq/manage.c:180
enable_irq+0x3a/0x55()
Sep 22 02:28:54 [kernel] Unbalanced enable for IRQ 20
Sep 22 02:28:54 [kernel] Modules linked in: sha256_generic cbc dm_crypt
crypto_blkcipher kvm_amd kvm bridge stp llc stv0297 budget_core dvb_core
saa7146 ttpci_eepr\
om ir_common k8temp i2c_core dm_snapshot dm_mirror dm_log scsi_wait_scan
[last unloaded: budget_ci]
Sep 22 02:28:54 [kernel] Pid: 5283, comm: qemu-system-x86 Tainted: G
W 2.6.27-rc5-11874-g19561b6 #11
Sep 22 02:28:54 [kernel] Call Trace:
Sep 22 02:28:54 [kernel]  [80238b04] warn_slowpath+0xb4/0xdc
Sep 22 02:28:54 [kernel]  [8026b099]
__alloc_pages_internal+0xde/0x419
Sep 22 02:28:54 [kernel]  [802758d0] get_user_pages+0x401/0x4ae
Sep 22 02:28:54 [kernel]  [80349269] __next_cpu+0x19/0x26
Sep 22 02:28:54 [kernel]  [80230ce2]
find_busiest_group+0x315/0x7c3
Sep 22 02:28:54 [kernel]  [a005de31] gfn_to_hva+0x9/0x5d [kvm]
- Last output repeated twice -
Sep 22 02:28:54 [kernel]  [a005e01b]
kvm_read_guest_page+0x34/0x46 [kvm]
Sep 22 02:28:54 [kernel]  [a005e06c] kvm_read_guest+0x3f/0x7c
[kvm]
Sep 22 02:28:54 [kernel]  [a0068bfe]
paging64_walk_addr+0xe0/0x2c1 [kvm]
Sep 22 02:28:54 [kernel]  [80260d59] enable_irq+0x3a/0x55
Sep 22 02:28:54 [kernel]  [a006df50]
kvm_notify_acked_irq+0x17/0x30 [kvm]
Sep 22 02:28:54 [kernel]  [a00701c5]
kvm_ioapic_update_eoi+0x2f/0x6e [kvm]
Sep 22 02:28:54 [kernel]  [a006f6da]
apic_mmio_write+0x24a/0x546 [kvm]
Sep 22 02:28:54 [kernel]  [a006498d]
emulator_write_emulated_onepage+0xa1/0xf3 [kvm]
Sep 22 02:28:54 [kernel]  [802206f8] paravirt_patch_call+0x13/0x2b
Sep 22 02:28:54 [kernel]  [a006c93e]
x86_emulate_insn+0x366a/0x41de [kvm]
Sep 22 02:28:54 [kernel]  [802206fa] paravirt_patch_call+0x15/0x2b
Sep 22 02:28:54 [kernel]  [a005f90b]
kvm_get_cs_db_l_bits+0x22/0x3a [kvm]
Sep 22 02:28:54 [kernel]  [a006167d]
emulate_instruction+0x198/0x25c [kvm]
Sep 22 02:28:54 [kernel]  [a0067dfe]
kvm_mmu_page_fault+0x46/0x83 [kvm]
Sep 22 02:28:54 [kernel]  [a00636a9]
kvm_arch_vcpu_ioctl_run+0x456/0x65c [kvm]
Sep 22 02:28:54 [kernel]  [8024d605] hrtimer_start+0x111/0x133
Sep 22 02:28:54 [kernel]  [a005d451] kvm_vcpu_ioctl+0xe0/0x459
[kvm]
Sep 22 02:28:54 [kernel]  [a005ee43] kvm_vm_ioctl+0x203/0x21b
[kvm]
Sep 22 02:28:54 [kernel]  [802353b6] finish_task_switch+0x2b/0xc4
Sep 22 02:28:54 [kernel]  [8029e3b5] vfs_ioctl+0x21/0x6c
Sep 22 02:28:54 [kernel]  [8029e627] do_vfs_ioctl+0x227/0x23d
Sep 22 02:28:54 [kernel]  [8029e67a] sys_ioctl+0x3d/0x5f
Sep 22 02:28:54 [kernel]  [8020b45a]
system_call_fastpath+0x16/0x1b
Sep 22 02:28:54 [kernel] ---[ end trace 7b8b990423985ddf ]---
Sep 22 02:28:54 [kernel] [ cut here ]


Xen works with that card, but Xen has other problems, and kvm is much
nicer :) So if you need a guinea pig with basic debugging knowledge, I'm
your man.

Best regards,
Jan C. Bernauer




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm: bios: switch MTRRs to cover only the PCI range and default to WB

2008-09-27 Thread Zwane Mwaikambo
On Sat, 27 Sep 2008, Avi Kivity wrote:

 Yang, Sheng wrote:
  I think we should do a little more than just write msr to update mtrr.
  
  Intel SDM 10.11.8 MTRR consideration in MP Systems define the procedure to
  modify MTRR msr in MP. Especially, step 4 enter no-fill cache mode(set
  CR0.CD bit and clean NW bit), step 12 re-enabled the caching(clear this two
  bits).
  
  We based on these behaviors to detect MTRR update.
  

 
 Why not simply flush the mmu on an mtrr write?
 
 (though of course I have no objection to doing what the manual says)

Detecting that condition is fine for operating systems which follow it, 
but some don't, including older Linux kernels :( Flushing on MTRR write, 
although being overzealous would be the most robust.

Zwane
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Thomas Fjellstrom
On Saturday 27 September 2008, Jan C. Bernauer wrote:
 Hi,

  I have about the same problem, so excuse me for hijacking this thread.

 My hardware consists of a 780g/SB700 Mainboard and a 4850e AMD CPU, and
 I'm interested in forwarding a DVB-C tuner card to the guest. Maybe
 some NICs later.

 I tried and 'sort of' got it working with Amit's kernel and userspace
 tools.
 First thing:
 The dvb-c card has an interesting memory mapping, as reported by
 lspci -v:
 Memory at cfdff000 (32-bit, non-prefetchable) [size=512]

 Size 512 doesn't fly with a check in kvm_main.c:
 if (mem-memory_size  (PAGE_SIZE - 1))
 goto out;

 So I patched the userspace utilities to use 4096 instead.

 With that patch, the guest saw the card, the driver got loaded,
 and channel tuning works, but I get some i2c timeouts on the
 guest side, and the host side has errors like:

 [ cut here ]
 Sep 22 02:28:54 [kernel] WARNING: at kernel/irq/manage.c:180
 enable_irq+0x3a/0x55()
 Sep 22 02:28:54 [kernel] Unbalanced enable for IRQ 20
 Sep 22 02:28:54 [kernel] Modules linked in: sha256_generic cbc dm_crypt
 crypto_blkcipher kvm_amd kvm bridge stp llc stv0297 budget_core dvb_core
 saa7146 ttpci_eepr\
 om ir_common k8temp i2c_core dm_snapshot dm_mirror dm_log scsi_wait_scan
 [last unloaded: budget_ci]
 Sep 22 02:28:54 [kernel] Pid: 5283, comm: qemu-system-x86 Tainted: G
 W 2.6.27-rc5-11874-g19561b6 #11
 Sep 22 02:28:54 [kernel] Call Trace:
 Sep 22 02:28:54 [kernel]  [80238b04] warn_slowpath+0xb4/0xdc
 Sep 22 02:28:54 [kernel]  [8026b099]
 __alloc_pages_internal+0xde/0x419
 Sep 22 02:28:54 [kernel]  [802758d0] get_user_pages+0x401/0x4ae
 Sep 22 02:28:54 [kernel]  [80349269] __next_cpu+0x19/0x26
 Sep 22 02:28:54 [kernel]  [80230ce2]
 find_busiest_group+0x315/0x7c3
 Sep 22 02:28:54 [kernel]  [a005de31] gfn_to_hva+0x9/0x5d [kvm]
 - Last output repeated twice -
 Sep 22 02:28:54 [kernel]  [a005e01b]
 kvm_read_guest_page+0x34/0x46 [kvm]
 Sep 22 02:28:54 [kernel]  [a005e06c] kvm_read_guest+0x3f/0x7c
 [kvm]
 Sep 22 02:28:54 [kernel]  [a0068bfe]
 paging64_walk_addr+0xe0/0x2c1 [kvm]
 Sep 22 02:28:54 [kernel]  [80260d59] enable_irq+0x3a/0x55
 Sep 22 02:28:54 [kernel]  [a006df50]
 kvm_notify_acked_irq+0x17/0x30 [kvm]
 Sep 22 02:28:54 [kernel]  [a00701c5]
 kvm_ioapic_update_eoi+0x2f/0x6e [kvm]
 Sep 22 02:28:54 [kernel]  [a006f6da]
 apic_mmio_write+0x24a/0x546 [kvm]
 Sep 22 02:28:54 [kernel]  [a006498d]
 emulator_write_emulated_onepage+0xa1/0xf3 [kvm]
 Sep 22 02:28:54 [kernel]  [802206f8]
 paravirt_patch_call+0x13/0x2b Sep 22 02:28:54 [kernel] 
 [a006c93e]
 x86_emulate_insn+0x366a/0x41de [kvm]
 Sep 22 02:28:54 [kernel]  [802206fa]
 paravirt_patch_call+0x15/0x2b Sep 22 02:28:54 [kernel] 
 [a005f90b]
 kvm_get_cs_db_l_bits+0x22/0x3a [kvm]
 Sep 22 02:28:54 [kernel]  [a006167d]
 emulate_instruction+0x198/0x25c [kvm]
 Sep 22 02:28:54 [kernel]  [a0067dfe]
 kvm_mmu_page_fault+0x46/0x83 [kvm]
 Sep 22 02:28:54 [kernel]  [a00636a9]
 kvm_arch_vcpu_ioctl_run+0x456/0x65c [kvm]
 Sep 22 02:28:54 [kernel]  [8024d605] hrtimer_start+0x111/0x133
 Sep 22 02:28:54 [kernel]  [a005d451] kvm_vcpu_ioctl+0xe0/0x459
 [kvm]
 Sep 22 02:28:54 [kernel]  [a005ee43] kvm_vm_ioctl+0x203/0x21b
 [kvm]
 Sep 22 02:28:54 [kernel]  [802353b6] finish_task_switch+0x2b/0xc4
 Sep 22 02:28:54 [kernel]  [8029e3b5] vfs_ioctl+0x21/0x6c
 Sep 22 02:28:54 [kernel]  [8029e627] do_vfs_ioctl+0x227/0x23d
 Sep 22 02:28:54 [kernel]  [8029e67a] sys_ioctl+0x3d/0x5f
 Sep 22 02:28:54 [kernel]  [8020b45a]
 system_call_fastpath+0x16/0x1b
 Sep 22 02:28:54 [kernel] ---[ end trace 7b8b990423985ddf ]---
 Sep 22 02:28:54 [kernel] [ cut here ]


 Xen works with that card, but Xen has other problems, and kvm is much
 nicer :) So if you need a guinea pig with basic debugging knowledge, I'm
 your man.

How did you manage to pull together those patches? They all seem so old, and 
won't likely apply cleanly to git head :(

 Best regards,
 Jan C. Bernauer


-- 
Thomas Fjellstrom
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Jan C. Bernauer
Thomas Fjellstrom wrote:

 How did you manage to pull together those patches? They all seem so
old, and
 won't likely apply cleanly to git head :(


Which patches do you mean? The patches for kvm?
There is a nice repository managed by Amit Shah:
Linux source:
http://git.kernel.org/?p=linux/kernel/git/amit/kvm.git;a=summary

kvm-userspace:
http://git.kernel.org/?p=linux/kernel/git/amit/kvm-userspace.git;a=summary

I think this compiled without too much problems.

Best regards,
Jan



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Thomas Fjellstrom
On Saturday 27 September 2008, Jan C. Bernauer wrote:
 Thomas Fjellstrom wrote:
  So I've checked out both of those trees and used head, and kvm-userspace
  is erroring out:
 
  gcc -I. -I.. -I/root/kvm-amit-userspace/qemu/target-i386
  -I/root/kvm-amit- userspace/qemu -MMD -MT qemu-kvm-x86.o -MP -DNEED_CPU_H
  -D_GNU_SOURCE - D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D__user=
  -I/root/kvm-amit- userspace/qemu/tcg
  -I/root/kvm-amit-userspace/qemu/tcg/x86_64 -I/root/kvm-
  amit-userspace/qemu/fpu -DHAS_AUDIO -DHAS_AUDIO_CHOICE -I/root/kvm-amit-
  userspace/qemu/slirp -I /root/kvm-amit-userspace/qemu/../libkvm 
  -DCONFIG_X86 -Wall -O2 -g -fno-strict-aliasing  -m64 -I /root/kvm-amit-
  userspace/kernel/include -c -o qemu-kvm-x86.o /root/kvm-amit-
  userspace/qemu/qemu-kvm-x86.c
  /root/kvm-amit-userspace/qemu/qemu-kvm-x86.c:522: error:
  âKVM_FEATURE_CLOCKSOURCEâ undeclared here (not in a function)
  /root/kvm-amit-userspace/qemu/qemu-kvm-x86.c:525: error:
  âKVM_FEATURE_NOP_IO_DELAYâ undeclared here (not in a function)
  /root/kvm-amit-userspace/qemu/qemu-kvm-x86.c:528: error:
  âKVM_FEATURE_MMU_OPâ undeclared here (not in a function)
 
  So I'm a little stuck now.

 Try running
   make sync LINUX=/root/kvm-amit
 (or whatever your kernel source dir is)
 in the kernel sub directory of your kvm-amit-userspace dir.
 Those KVM_FEATURE_* should be defined somewhere in kvm_para.h,
 which is in there.

that leaves me with:

/root/kvm-amit-userspace/qemu/../libkvm/libkvm.h:28: warning: âstruct 
kvm_msr_entryâ declared inside parameter list   
  
/root/kvm-amit-userspace/qemu/../libkvm/libkvm.h:28: warning: its scope is 
only this definition or declaration, which is probably not what you want

and a bunch more errors.

 Best regards,
 Jan



 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Thomas Fjellstrom
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Jan C. Bernauer
Thomas Fjellstrom wrote:
 
 that leaves me with:
 
 /root/kvm-amit-userspace/qemu/../libkvm/libkvm.h:28: warning: âstruct 
 kvm_msr_entryâ declared inside parameter list 
 
 /root/kvm-amit-userspace/qemu/../libkvm/libkvm.h:28: warning: its scope is 
 only this definition or declaration, which is probably not what you want
 
 and a bunch more errors.
 
 

Well, these are warning, and I might have ignored them :)
What are the errors?

Anyway, I'll be off now, so I won't respond till tomorrow.

Best regards,
Jan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Thomas Fjellstrom
On Saturday 27 September 2008, Jan C. Bernauer wrote:
 Thomas Fjellstrom wrote:
  that leaves me with:
 
  /root/kvm-amit-userspace/qemu/../libkvm/libkvm.h:28: warning: âstruct
  kvm_msr_entryâ declared inside parameter list
  /root/kvm-amit-userspace/qemu/../libkvm/libkvm.h:28: warning: its scope
  is only this definition or declaration, which is probably not what you
  want
 
  and a bunch more errors.

 Well, these are warning, and I might have ignored them :)
 What are the errors?

 Anyway, I'll be off now, so I won't respond till tomorrow.

libkvm.h:404: error: expected '=', ',' ';', 'asm' or '__attribute__' before 
'kvm_get_cr8'   

libkvm.c:145: error: expected declaration specifiers or '...' before '__u32'  


and quite a few more after that.
 Best regards,
 Jan

-- 
Thomas Fjellstrom
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Tian, Kevin
From:Avi Kivity
Sent: 2008年9月27日 17:50

Yang, Sheng wrote:
 After check host shared interrupts situation, I got a question here:

 If I understand correctly, current solution don't block host
shared irq, just
 come with the performance pentry. The penalty come with host
disabled irq
 line for a period. We have to wait guest to write EOI. But I
fail to see the
 correctness problem here (except a lot of spurious interrupt
in the guest).

 I've checked mail, but can't find clue about that. Can you
explain the
 situation?



If the guest fails to disable interrupts on a device that shares an
interrupt line with the host, the host will experience an interrupt
flood.  Eventually the host will disable the host device as well.


This issue also exists on host side, that one misbehaved driver
can hurt all other drivers sharing same irq line. But it seems no
good way to avoid it. Since not all devices support MSI, we still
need support irq sharing possibly with above caveats given.

Existing approach at least works with a sane guest driver, with
some performance penality there.

Or do you have better alternative?

Thanks,
Kevin


Re: [PATCH 4/4] kvm: bios: switch MTRRs to cover only the PCI range and default to WB

2008-09-27 Thread Sheng Yang
On Saturday 27 September 2008 21:55:33 Zwane Mwaikambo wrote:
 On Sat, 27 Sep 2008, Avi Kivity wrote:
  Yang, Sheng wrote:
   I think we should do a little more than just write msr to update mtrr.
  
   Intel SDM 10.11.8 MTRR consideration in MP Systems define the
   procedure to modify MTRR msr in MP. Especially, step 4 enter no-fill
   cache mode(set CR0.CD bit and clean NW bit), step 12 re-enabled the
   caching(clear this two bits).
  
   We based on these behaviors to detect MTRR update.
 
  Why not simply flush the mmu on an mtrr write?
 
  (though of course I have no objection to doing what the manual says)

 Detecting that condition is fine for operating systems which follow it,
 but some don't, including older Linux kernels :( Flushing on MTRR write,
 although being overzealous would be the most robust.

OK, this trade off is reasonable, I will update the mtrr patch.

Hope we won't got problem in so early stage. :)
--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Dong, Eddie
Tian, Kevin wrote:
 From:Avi Kivity
 Sent: 2008年9月27日 17:50
 
 Yang, Sheng wrote:
 After check host shared interrupts situation, I got a
 question here: 
 
 If I understand correctly, current solution don't block
 host shared irq, just come with the performance pentry.
 The penalty come with host disabled irq line for a
 period. We have to wait guest to write EOI. But I fail
 to see the correctness problem here (except a lot of
 spurious interrupt in the guest).  
 
 I've checked mail, but can't find clue about that. Can
 you explain the situation? 
 
 
 
 If the guest fails to disable interrupts on a device
 that shares an interrupt line with the host, the host
 will experience an interrupt flood.  Eventually the host
 will disable the host device as well. 
 
 
 This issue also exists on host side, that one misbehaved
 driver can hurt all other drivers sharing same irq line.
 But it seems no good way to avoid it. Since not all
 devices support MSI, we still need support irq sharing
 possibly with above caveats given. 
 
 Existing approach at least works with a sane guest
 driver, with some performance penality there.
 
 Or do you have better alternative?
 
 Thanks,
 Kevin


MSI is always 1st choice. Including taking host MSI for guest IOAPIC situation 
because we don't if guest OS has MSI support but we are sure host Linux can.

When MSI is impossible, I recommend we disable device assignment for those 
sharing interrupt , or we assign all devices with same interrupt to same guest. 
Yes the issue is same in native, but in native the whole OS (kernel) is in same 
isolation domain, but now different guest has different isolation domain :(

In one world, MSI is pretty important for direct IO, and SR-IOV is #1 usage in 
future. Just advocate more and wish more people can ack the SR-IOV patch from 
ZhaoYU so that we can see 2.6,28 work for direct I/O without sacrificing 
sharing :)

Eddie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Dong, Eddie

 I don't see how this relates to shared guest interrupts. 
 Whatever you have on the host side, you still need to
 support shared guest interrupts.  The only way to avoid
 the issue is by using MSI for the guest, and even then we
 still have to support interrupt sharing since not all
 guests have MSI support. 

Yes, but guest sharing is easy to solve by emulating NAND gate of PCI
line. 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Tian, Kevin
From: Dong, Eddie
Sent: 2008年9月28日 10:04

Tian, Kevin wrote:
 From:Avi Kivity
 Sent: 2008年9月27日 17:50

 Yang, Sheng wrote:
 After check host shared interrupts situation, I got a
 question here:

 If I understand correctly, current solution don't block
 host shared irq, just come with the performance pentry.
 The penalty come with host disabled irq line for a
 period. We have to wait guest to write EOI. But I fail
 to see the correctness problem here (except a lot of
 spurious interrupt in the guest).

 I've checked mail, but can't find clue about that. Can
 you explain the situation?



 If the guest fails to disable interrupts on a device
 that shares an interrupt line with the host, the host
 will experience an interrupt flood.  Eventually the host
 will disable the host device as well.


 This issue also exists on host side, that one misbehaved
 driver can hurt all other drivers sharing same irq line.
 But it seems no good way to avoid it. Since not all
 devices support MSI, we still need support irq sharing
 possibly with above caveats given.

 Existing approach at least works with a sane guest
 driver, with some performance penality there.

 Or do you have better alternative?

 Thanks,
 Kevin


MSI is always 1st choice. Including taking host MSI for guest
IOAPIC situation because we don't if guest OS has MSI support
but we are sure host Linux can.

When MSI is impossible, I recommend we disable device
assignment for those sharing interrupt , or we assign all
devices with same interrupt to same guest. Yes the issue is
same in native, but in native the whole OS (kernel) is in same
isolation domain, but now different guest has different
isolation domain :(

In one world, MSI is pretty important for direct IO, and
SR-IOV is #1 usage in future. Just advocate more and wish more
people can ack the SR-IOV patch from ZhaoYU so that we can see
2.6,28 work for direct I/O without sacrificing sharing :)


Yes, irq sharing is most tricky stuff, and hard to make it
architectureally clean. Besides irq storm mentioned by Avi,
driver timeout or device buffer overflow is also subtle to be
intervened by the guest sharing irq. Guest inter-dependency
can impact shared irq handling too. If people do care those
issues that known irq sharing approaches can't address,
your recommendation looks making sense.

Thanks
Kevin
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�筏�hФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f

Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity
Tian, Kevin wrote:

 If the guest fails to disable interrupts on a device that shares an
 interrupt line with the host, the host will experience an interrupt
 flood.  Eventually the host will disable the host device as well.

 

 This issue also exists on host side, that one misbehaved driver
 can hurt all other drivers sharing same irq line. 

There is no issue on the host, since all drivers operate on the same
trust level. A misbehaving driver on the host will take down the entire
system even without shared interrupts, by corrupting memory, not
releasing a lock, etc.

But if you move a driver to the guest, you expect it will be isolated
from the rest of the system, and if there are shared interrupts, it isn't.

 But it seems no
 good way to avoid it. Since not all devices support MSI, we still
 need support irq sharing possibly with above caveats given.

 Existing approach at least works with a sane guest driver, with
 some performance penality there.

   

How can we recommend it to users? We tell them, your guests are isolated
and secure as long as they don't misbehave?

 Or do you have better alternative?
   

No. Maybe the Neocleus polarity trick (which also reduces performance).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity

Dong, Eddie wrote:
I don't see how this relates to shared guest interrupts. 
Whatever you have on the host side, you still need to

support shared guest interrupts.  The only way to avoid
the issue is by using MSI for the guest, and even then we
still have to support interrupt sharing since not all
guests have MSI support. 



Yes, but guest sharing is easy to solve by emulating NAND gate of PCI
line. 
  


Certainly, it isn't difficult.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of pci passthrough work?

2008-09-27 Thread Avi Kivity

Jan C. Bernauer wrote:

Hi,

 I have about the same problem, so excuse me for hijacking this thread.

My hardware consists of a 780g/SB700 Mainboard and a 4850e AMD CPU, and
I'm interested in forwarding a DVB-C tuner card to the guest. Maybe
some NICs later.

I tried and 'sort of' got it working with Amit's kernel and userspace
tools.
First thing:
The dvb-c card has an interesting memory mapping, as reported by
lspci -v:
Memory at cfdff000 (32-bit, non-prefetchable) [size=512]

Size 512 doesn't fly with a check in kvm_main.c:
if (mem-memory_size  (PAGE_SIZE - 1))
goto out;

So I patched the userspace utilities to use 4096 instead.

With that patch, the guest saw the card, the driver got loaded,
and channel tuning works, but I get some i2c timeouts on the
guest side, and the host side has errors like:

[ cut here ]
Sep 22 02:28:54 [kernel] WARNING: at kernel/irq/manage.c:180
enable_irq+0x3a/0x55()
Sep 22 02:28:54 [kernel] Unbalanced enable for IRQ 20
  


That looks due to bad error handling, due to the failures you had before.

Try rebooting the host and starting again with the patched userspace.

Amit, can you take a look at the error handling paths?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Tian, Kevin
From: Avi Kivity [mailto:[EMAIL PROTECTED]
Sent: 2008年9月28日 12:23

There is no issue on the host, since all drivers operate on the same
trust level. A misbehaving driver on the host will take down the entire
system even without shared interrupts, by corrupting memory, not
releasing a lock, etc.

But if you move a driver to the guest, you expect it will be isolated
from the rest of the system, and if there are shared
interrupts, it isn't.


Yes, you're right

 Or do you have better alternative?


No. Maybe the Neocleus polarity trick (which also reduces performance).


To my knowledge, Neocleus polarity trick can't solve this isolation
issue, which just provides one effecient way to track assertion/deassertion
transition on the irq line. For example, reverse polarity when receiving an
instance, and then a new irq instance would occur when all devices de-
assert on shared irq line, and then recover the polarity. In your concerned
case where guest driver misbehaves, this polarity trick can't work neither
as one device always asserts the line.

Thanks,
Kevin
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�筏�hФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f

Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity
Tian, Kevin wrote:

 No. Maybe the Neocleus polarity trick (which also reduces performance).
 

 To my knowledge, Neocleus polarity trick can't solve this isolation
 issue, which just provides one effecient way to track assertion/deassertion
 transition on the irq line. For example, reverse polarity when receiving an
 instance, and then a new irq instance would occur when all devices de-
 assert on shared irq line, and then recover the polarity. In your concerned
 case where guest driver misbehaves, this polarity trick can't work neither
 as one device always asserts the line.
   

You're right, I didn't think it through.

If there was a standard way to mask pci irqs, it might have worked, but
there isn't, unfortunately.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Yang, Sheng
On Sunday 28 September 2008 13:04:06 Avi Kivity wrote:
 Tian, Kevin wrote:
  No. Maybe the Neocleus polarity trick (which also reduces performance).
 
  To my knowledge, Neocleus polarity trick can't solve this isolation
  issue, which just provides one effecient way to track
  assertion/deassertion transition on the irq line. For example, reverse
  polarity when receiving an instance, and then a new irq instance would
  occur when all devices de- assert on shared irq line, and then recover
  the polarity. In your concerned case where guest driver misbehaves, this
  polarity trick can't work neither as one device always asserts the line.

 You're right, I didn't think it through.

 If there was a standard way to mask pci irqs, it might have worked, but
 there isn't, unfortunately.

What if we got a way to mask pci irqs? We also have to unmask pci irq when 
guest wrote EOI to vlapic(or at any other time). I think this still cause 
problem. The problem is, we don't know if guest would deassert the line. 
Maybe add some time-based detection here might work?

And about the mask of pci irq, how about disable PCI device interrupt using 
Device Control Register bit 10? Not sure if it would affect the pending 
transaction, also not sure all device support this (though they should 
support).

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Dong, Eddie
Avi Kivity wrote:
 Dong, Eddie wrote:
 I don't see how this relates to shared guest interrupts.
 Whatever you have on the host side, you still need to
 support shared guest interrupts.  The only way to avoid
 the issue is by using MSI for the guest, and even then
 we still have to support interrupt sharing since not
 all guests have MSI support. 
 
 
 Yes, but guest sharing is easy to solve by emulating
 NAND gate of PCI line. 
 
 
 Certainly, it isn't difficult.

BTW, Did u have a look at SR-IOV patch? It address both Xen  KVM.
Thx, eddie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Yang, Sheng
On Sunday 28 September 2008 13:04:06 Avi Kivity wrote:
 Tian, Kevin wrote:
  No. Maybe the Neocleus polarity trick (which also reduces performance).
 
  To my knowledge, Neocleus polarity trick can't solve this isolation
  issue, which just provides one effecient way to track
  assertion/deassertion transition on the irq line. For example, reverse
  polarity when receiving an instance, and then a new irq instance would
  occur when all devices de- assert on shared irq line, and then recover
  the polarity. In your concerned case where guest driver misbehaves, this
  polarity trick can't work neither as one device always asserts the line.

 You're right, I didn't think it through.

 If there was a standard way to mask pci irqs, it might have worked, but
 there isn't, unfortunately.

One purpose:

If we suffered from IRQ storm of one level triggered irq line, two possible: 
host issue or guest issue.

If it's a host issue, host should try to stop it. If it can't, the IRQ line 
would be disabled, and guest device also isn't functional. 

If it's a guest issue, guest should try to stop it, and prevent it from 
causing trouble in host. KVM should try best including disable guest device 
to do this. So guest device also won't functional.

Base on above theory, we can assume that IRQ storm caused by assigned guest 
device, and try to stop device from doing this. (Yeah, anyway, guest device 
won't survive).

I think we can brought a little QoS concept here(stolen from Eddie :) ). The 
assumption is, the normal rate of device deliver interrupts is much slower 
than a continuous level trigger if the EOI is wrote immediately. So we can do 
something with the gap.

Measure the calling rate of our irq handler, if it's exceed some reasonable 
threshold, KVM would try to stop guest device for a while (even it don't know 
if the guest device cause this).

First to try set interrupt disable bit in Device Control Register, wait for a 
period of time, then check again.

If the irq strom can't be stopped, KVM try a more aggressive way: Do the 
Function Level Reset. It's should be the end of device's life...

Oh, of course, if even FLR didn't solve the IRQ storm, that's host's issue. 
Let's wait host to disable the IRQ line - of course, the guest device can't 
be recovered too.

It's just a initial purpose, I think it may work. The problem is if the gap is 
easy to catch... But at least, I think a physical continuous one should be 
much different from any working ones...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html