Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-12 Thread Laurent Pinchart
Hello,

On Friday 12 December 2014 13:15:51 Tomasz Figa wrote:
> On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki wrote:
> > On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote:
> >> On 11 December 2014 at 16:31, Kevin Hilman  wrote:
> >> > [+ Laurent Pinchart]
> >> > 
> >> > Tomasz Figa  writes:
> >> >> On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson wrote:
> >> > [...]
> >> > 
> >>  @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct
> >>  platform_device *pdev)>>  
> >>  return -ENXIO;
> >>  
> >>  }
> >>  
> >>  +   pm_runtime_no_callbacks(dev);
> >>  +   pm_runtime_enable(dev);
> >>  +
> >>  +   /* Synchronize state of the domain with driver data. */
> >>  +   pm_runtime_get_sync(dev);
> >>  +   iommu->is_powered = true;
> >> >>> 
> >> >>> Doesn't the runtime PM status reflect the value of "is_powered", thus
> >> >>> why do you need to have a copy of it? Could it perpahps be that you
> >> >>> try to cope with the case when CONFIG_PM is unset?
> >> >> 
> >> >> It's worth noting that this driver fully relies on status of other
> >> >> devices in the power domain the IOMMU is in and does not enforce the
> >> >> status on its own. So in general, as far as my understanding of PM
> >> >> runtime subsystem, the status of the IOMMU device will be always
> >> >> suspended, because nobody will call pm_runtime_get() on it (except the
> >> >> get and put pair in probe). So is_powered is here to track status of
> >> >> the domain, not the device. Feel free to suggest a better way, though.
> >> > 
> >> > I still don't like these notifiers.  I think they add ways to bypass
> >> > having proper runtime PM implemented for devices/subsystems.
> >> 
> >> I do agree, but I haven't found another good solution to the problem.
> > 
> > For the record, I'm not liking this mostly because it "fixes" a generic
> > problem in a way that's hidden in the genpd code and very indirect.
> 
> Well, that's true. This is indeed a generic problem of PM dependencies
> between devices (other than those represented by parent-child
> relation), which in fact doesn't have much to do with genpd, but
> rather with those devices directly. It is just that genpd is the most
> convenient location to solve this in current code and in a simple way.
> In other words, I see this solution as a reasonable way to get the
> problem solved quickly for now, so that we can start thinking about a
> more elegant solution.
> 
> >> > From a high-level, the IOMMU is just another device inside the PM
> >> > domain, so ideally it should be doing it's own _get() and _put() calls
> >> > so the PM domain code would just do the right thing without the need
> >> > for notifiers.
> >> 
> >> As I understand it, the IOMMU (or for other similar cases) shouldn't
> >> be doing any get() and put() at all because there are no IO API to
> >> serve request from.

Speaking purely from an IOMMU point of view that's not entirely tree. IOMMU 
drivers expose map and unmap operations, so they can track whether any memory 
is mapped. From a bus master point of view the IOMMU map and unmap operations 
are hidden by the DMA mapping API. The IOMMU can thus track the existence of 
mappings without any IOMMU awareness in the bus master driver.

If no mapping exist the IOMMU shouldn't receive any translation request. An 
IOMMU driver could thus call pm_runtime_get_sync() in the map handler when 
creating the first mapping, and pm_runtime_put() in the unmap handler when 
tearing the last mapping down.

One could argue that the IOMMU would end up being powered more often than 
strictly needed, as bus masters drivers, even when written properly, could 
keep mappings around at times they don't perform bus access. This is true, and 
that's an argument I've raised during the last kernel summit. The general 
response (including Linus Torvald's) was that micro-optimizing power 
management might not be worth it, and that measurements proving that the gain 
is worth it are required before introducing new APIs to solve the problem. I 
can't disagree with that argument.

> >> In principle we could consider these kind devices as "parent" devices
> >> to those other devices that needs them. Then runtime PM core would
> >> take care of things for us, right!?
> >> 
> >> Now, I am not so sure using the "parent" approach is actually viable,
> >> since it will likely have other complications, but I haven't
> >> thoroughly thought it though yet.
> > 
> > That actually need not be a "parent".
> > 
> > What's needed in this case is to do a pm_runtime_get_sync() on a device
> > depended on every time a dependent device is runtime-resumed (and
> > analogously for suspending).
> > 
> > The core doesn't have a way to do that, but it looks like we'll need to
> > add it anyway for various reasons (ACPI _DEP is one of them as I mentioned
> > some time ago, but people dismissed it basically as not their probl

Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-12 Thread Kevin Hilman
[+ Laurent Pinchart]

Tomasz Figa  writes:

> On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson  wrote:

[...]

>>> @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct platform_device 
>>> *pdev)
>>> return -ENXIO;
>>> }
>>>
>>> +   pm_runtime_no_callbacks(dev);
>>> +   pm_runtime_enable(dev);
>>> +
>>> +   /* Synchronize state of the domain with driver data. */
>>> +   pm_runtime_get_sync(dev);
>>> +   iommu->is_powered = true;
>>
>> Doesn't the runtime PM status reflect the value of "is_powered", thus
>> why do you need to have a copy of it? Could it perpahps be that you
>> try to cope with the case when CONFIG_PM is unset?
>>
>
> It's worth noting that this driver fully relies on status of other
> devices in the power domain the IOMMU is in and does not enforce the
> status on its own. So in general, as far as my understanding of PM
> runtime subsystem, the status of the IOMMU device will be always
> suspended, because nobody will call pm_runtime_get() on it (except the
> get and put pair in probe). So is_powered is here to track status of
> the domain, not the device. Feel free to suggest a better way, though.

I still don't like these notifiers.  I think they add ways to bypass
having proper runtime PM implemented for devices/subsystems.

>From a high-level, the IOMMU is just another device inside the PM
domain, so ideally it should be doing it's own _get() and _put() calls
so the PM domain code would just do the right thing without the need for
notifiers.

No knowing a lot about the IOMMU API, I'm guessing the reason you're not
doing that is because the IOMMU API currently doesn't have an easy way
to keep track of *active* users so it's not obvious where to put those
_get and _put calls.  If that doesn't exist, perhaps a simple
iommu_get() and iommu_put() API needs to be introduced (which inside the
IOMMU core would just do runtime PM calls) so that users of the IOMMU
could inform the subsystem that the IOMMU is needed and it should not be
powered off.

I Cc'd Laurent because I know he's thought about this before from the
IOMMU side, and not sure if he came up with a solution.

Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-12 Thread Kevin Hilman
Tomasz Figa  writes:

[...]

> We have a power domain, which contains an IOMMU and an IP block, which
> can do bus transactions through that IOMMU. Of course the IP block is
> not aware of the IOMMU, because this is just an integration detail and
> on other platforms using the same IP block the IOMMU might not be
> there. This implies that the driver for this IP block should not be
> aware of the IOMMU either, which, on the buffer allocation and mapping
> side, is achieved by DMA mapping subsystem. We would also want the
> IOMMU to be fully transparent to that driver on PM side.

An even more exciting problem exists when a CPU is in the same domain as
other peripherals, those peripherals are all idle and the power domain
is gated. :)

> Now, for PM related details, they are located in the same power
> domain, which means that whenever the power domain is turned off, the
> CPU can't access their registers and all the hardware state is gone.
> To make the case more interesting, there is no point in programming
> the IOMMU unless the device using it is powered on. Similarly, there
> is no point in powering the domain on to just access the IOMMU. This
> implies that the power domain should be fully controlled by the
> stand-alone IP block, while the peripheral IOMMU shouldn't affect its
> state and its driver only respond to operations performed by driver of
> that stand-alone IP block.

Which implies that the IOMMU driver should have it's own set of
runtime_suspend/runtime_resume callbacks to properly save/restore state
as well.

> A solution like below could work for the above:
>
> - There is a per-device list of peripheral devices, which need to be
> powered on for this device to work.
> - Whenever the IOMMU subsystem/driver binds an IOMMU to a device, it
> adds the IOMMU device to the list of peripheral devices of that device
> (and similarly for removal).
> - A runtime PM operation on a device will also perform the same
> operation on all its peripheral devices.

Yes, that is exactly what we need.

I'd rather use the term "dependent" devices rather than peripheral
devices, but otherwise it sounds like the right approach to me.

Kevin

> Another way would be to extend what the PM runtime core does with
> parent-child relations to handle the whole list of peripheral devices
> instead of just the parent.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO

2014-12-12 Thread Joerg Roedel
On Fri, Dec 12, 2014 at 10:25:31AM +0800, Li, ZhenHua wrote:
> Sorry I have no plan yet.
> Could you send me your logs on your AMD system?

> On 12/10/2014 04:46 PM, Baoquan He wrote:
> >This issue happens on AMD iommu too, do you have any plans or
> >thoughts on that?

I think the best approach for now is to get a prove-of-concept on the
VT-d driver. If it works there the way we expect, we can implement the
same handling in the AMD driver. But I see no reason to hold back the
VT-d patches until it is also fixed for AMD systems.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/vt-d: Only remove domain when device is removed

2014-12-12 Thread Joerg Roedel
Hi Jerry,

On Thu, Dec 11, 2014 at 09:35:34AM -0700, Jerry Hoemann wrote:
> On Tue, Dec 09, 2014 at 01:15:25PM +0100, Joerg Roedel wrote:
> > >From d65b236d0f27fe3ef7ac4d12cceb0da67aec86ce Mon Sep 17 00:00:00 2001
> > From: Joerg Roedel 
> > Date: Tue, 9 Dec 2014 12:56:45 +0100
> > Subject: [PATCH] iommu/vt-d: Fix dmar_domain leak in iommu_attach_device
> > 
> > Since commit 1196c2f a domain is only destroyed in the
> > notifier path if it is hot-unplugged. This caused a
> > domain leakage in iommu_attach_device when a driver was
> > unbound from the device and bound to VFIO. In this case the
> > device is attached to a new domain and unlinked from the old
> > domain. At this point nothing points to the old domain
> > anymore and its memory is leaked.
> > Fix this by explicitly freeing the old domain in
> > iommu_attach_domain.
> > 
> > Fixes: 1196c2f 'iommu/vt-d: Only remove domain when device is removed'
> > Signed-off-by: Joerg Roedel 
> > ---
> >  drivers/iommu/intel-iommu.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 1232336..9ef8e89 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -4424,10 +4424,13 @@ static int intel_iommu_attach_device(struct 
> > iommu_domain *domain,
> >  
> > old_domain = find_domain(dev);
> > if (old_domain) {
> > -   if (domain_type_is_vm_or_si(dmar_domain))
> > +   if (domain_type_is_vm_or_si(dmar_domain)) {
> 
> 
> JAH>  This path is executed when starting the VM.
> 
> 
> > domain_remove_one_dev_info(old_domain, dev);
> > -   else
> > +   } else {
> 
> 
> JAH>  I don't see this path being executed.
> 
> > domain_remove_dev_info(old_domain);
> > +   if (list_empty(&old_domain->devices))
> > +   domain_exit(old_domain);
> > +   }

You are right, thanks for testing. The reason is that the check for
domain_type_is_vm_or_si(dmar_domain) uses the new domain and not the old
one. I'll post a new patch.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Updates for Linux v3.19

2014-12-12 Thread Joerg Roedel
Hi Linus,

The following changes since commit fc14f9c1272f62c3e8d01300f52467c0d9af50f9:

  Linux 3.18-rc5 (2014-11-16 16:36:20 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-updates-v3.19

for you to fetch changes up to 76771c938e95ce4106c6e8092f4f614d4d1e0ecc:

  Merge branches 'arm/omap', 'arm/msm', 'arm/rockchip', 'arm/renesas', 
'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next (2014-12-02 13:07:13 
+0100)



IOMMU Updates for Linux v3.19

This time with:

* A new IOMMU-API call: iommu_map_sg() to map multiple
  non-contiguous pages into an IO address space with only one
  API call. This allows certain optimizations in the IOMMU
  driver.

* DMAR device hotplug in the Intel VT-d driver. It is now
  possible to hotplug the IOMMU itself.

* A new IOMMU driver for the Rockchip ARM platform.

* Couple of cleanups and improvements in the OMAP IOMMU driver.

* Nesting support for the ARM-SMMU driver.

* Various other small cleanups and improvements.

Please note that this time some branches were also pulled into other
trees, like the DRI and the Tegra tree. The VT-d branch was also pulled
into tip/x86/apic.
Some patches for the AMD IOMMUv2 driver are not in the IOMMU tree but
were merged by Andrew (or finally ended up in the DRI tree).


Antonios Motakis (3):
  iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC
  iommu: add capability IOMMU_CAP_NOEXEC
  iommu/arm-smmu: add IOMMU_CAP_NOEXEC to the ARM SMMU driver

Axel Lin (1):
  iommu/ipmmu-vmsa: Return proper error if devm_request_irq fails

Daniel Kurtz (2):
  iommu/rockchip: rk3288 iommu driver
  dt-bindings: iommu: Add documentation for rockchip iommu

Heiko Stübner (1):
  iommu: Improve error handling when setting bus iommu

Jiang Liu (9):
  iommu/vt-d: Introduce helper function dmar_walk_resources()
  iommu/vt-d: Dynamically allocate and free seq_id for DMAR units
  iommu/vt-d: Implement DMAR unit hotplug framework
  iommu/vt-d: Search for ACPI _DSM method for DMAR hotplug
  iommu/vt-d: Enhance intel_irq_remapping driver to support DMAR unit 
hotplug
  iommu/vt-d: Enhance error recovery in function 
intel_enable_irq_remapping()
  iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
  pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug
  iommu/vt-d: Fix an off-by-one bug in __domain_mapping()

Joerg Roedel (5):
  iommu: Do more input validation in iommu_map_sg()
  iommu/rockchip: Allow to compile with COMPILE_TEST
  powerpc/iommu: Rename iommu_[un]map_sg functions
  Merge branch 'for-joerg/arm-smmu/updates' of 
git://git.kernel.org/.../will/linux into arm/smmu
  Merge branches 'arm/omap', 'arm/msm', 'arm/rockchip', 'arm/renesas', 
'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next

Kiran Padwal (2):
  iommu/msm: Use dev_get_platdata()
  iommu/omap: Use dev_get_platdata()

Li, Zhen-Hua (1):
  x86/vt-d: Fix incorrect bit operations in setting values

Oded Gabbay (1):
  iommu/amd: Fix accounting of device_state

Olav Haugan (1):
  iommu: Add iommu_map_sg() function

Robin Murphy (1):
  iommu: Decouple iommu_map_sg from CPU page size

SF Markus Elfring (1):
  iommu/msm: Deletion of unnecessary checks before clk_disable()

Suman Anna (17):
  iommu/omap: Remove refcount field from omap_iommu object
  iommu/omap: Remove unused isr_priv field from omap_iommu
  iommu/omap: Remove duplicate declarations
  iommu/omap: Remove conditional definition of dev_to_omap_iommu()
  iommu/omap: Remove ver debugfs entry
  iommu/omap: Remove omap_iommu_arch_version() and version field
  iommu/omap: Remove bogus version check in context save/restore
  iommu/omap: Simplify omap2_iommu_fault_isr()
  iommu/omap: Consolidate OMAP IOMMU modules
  iommu/omap: Fix the permissions on nr_tlb_entries
  iommu/omap: Make pagetable debugfs entry read-only
  iommu/omap: Integrate omap-iommu-debug into omap-iommu
  iommu/omap: Remove couple of unused exported functions
  iommu/omap: Do not export unneeded functions
  iommu/omap: Reset the domain field upon detaching
  iommu/omap: Fix bus error on debugfs access of unattached IOMMU
  iommu/omap: Switch pagetable debugfs entry to use seq_file

Thierry Reding (1):
  iommu/arm-smmu: Play nice on non-ARM/SMMU systems

Will Deacon (2):
  iommu/amd: remove compiler warning due to IOMMU_CAP_NOEXEC
  iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute

 .../devicetree/bindings/iommu/rockchip,iommu.txt   |   26 +
 arch/powerpc/include/asm/iommu.h   |   17 +-
 arch/powerpc/kernel/dma-iommu.c|8 +-
 ar

[v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2014-12-12 Thread Feng Wu
This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Clear 'SN'
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu 
---
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/vmx.c  | 96 +
 arch/x86/kvm/x86.c  | 22 +++---
 include/linux/kvm_host.h|  4 ++
 virt/kvm/kvm_main.c |  6 +++
 5 files changed, 123 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 13e3e40..32c110a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, 
int level)
 
 #define ASYNC_PF_PER_VCPU 64
 
+extern void (*wakeup_handler_callback)(void);
+
 enum kvm_reg {
VCPU_REGS_RAX = 0,
VCPU_REGS_RCX = 1,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bf2e6cd..a1c83a2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
 static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
 static unsigned long *vmx_msr_bitmap_legacy;
@@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
struct pi_desc old, new;
unsigned int dest;
+   unsigned long flags;
 
memset(&old, 0, sizeof(old));
memset(&new, 0, sizeof(new));
@@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
new.nv = POSTED_INTR_VECTOR;
} while (cmpxchg(&pi_desc->control, old.control,
new.control) != old.control);
+
+   /*
+* Delete the vCPU from the related wakeup queue
+* if we are resuming from blocked state
+*/
+   if (vcpu->blocked) {
+   vcpu->blocked = false;
+   spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+   vcpu->wakeup_cpu), flags);
+   list_del(&vcpu->blocked_vcpu_list);
+   
spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+   vcpu->wakeup_cpu), flags);
+   vcpu->wakeup_cpu = -1;
+   }
}
 }
 
@@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
if (irq_remapping_cap(IRQ_POSTING_CAP)) {
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
struct pi_desc old, new;
+   unsigned long flags;
+   int cpu;
+   struct cpumask cpu_others_mask;
 
memset(&old, 0, sizeof(old));
memset(&new, 0, sizeof(new));
@@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
pi_set_sn(&new);
} while (cmpxchg(&pi_desc->control, old.control,
new.control) != old.control);
+   } else if (vcpu->blocked) {
+   /*
+* The vcpu is blocked on the wait queue.
+* Store the blocked vCPU on the list of the
+* vcpu->wakeup_cpu, which is the destination
+* of the wake-up notification event.
+*/
+   vcpu->wakeup_cpu = vcpu->cpu;
+   spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->wakeup_cpu), flags);
+   list_add_tail(&vcpu->blocked_vcpu_list,
+ &per_cpu(blocked_vcpu_on_cpu,
+ vcpu->wakeup_cpu));
+   spin_unlock_irqrestore(
+   &per_cpu(blocked_vcpu_on_cpu_lock,
+   vcpu->wakeup_cpu), flags);
+
+   do {
+   old.control = new.control = pi_desc->control;
+
+   /*
+* We should not block the vCPU if
+* an interrupt is posted for it.
+*/
+   if (pi_test_on(pi_desc) == 1) {
+  

[v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

2014-12-12 Thread Feng Wu
Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu 
---
 Documentation/kernel-parameters.txt |  1 +
 drivers/iommu/irq_remapping.c   | 12 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 838f377..324b790 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1453,6 +1453,7 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
nosid   disable Source ID checking
no_x2apic_optout
BIOS x2APIC opt-out request will be ignored
+   nopost  disable Interrupt Posting
 
iomem=  Disable strict checking of access to MMIO memory
strict  regions from userspace.
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index b008663..aa3cd23 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -24,7 +24,7 @@ int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
-int disable_irq_post = 1;
+int disable_irq_post = 0;
 
 static struct irq_remap_ops *remap_ops;
 
@@ -59,14 +59,18 @@ static __init int setup_irqremap(char *str)
return -EINVAL;
 
while (*str) {
-   if (!strncmp(str, "on", 2))
+   if (!strncmp(str, "on", 2)) {
disable_irq_remap = 0;
-   else if (!strncmp(str, "off", 3))
+   disable_irq_post = 0;
+   } else if (!strncmp(str, "off", 3)) {
disable_irq_remap = 1;
-   else if (!strncmp(str, "nosid", 5))
+   disable_irq_post = 1;
+   } else if (!strncmp(str, "nosid", 5))
disable_sourceid_checking = 1;
else if (!strncmp(str, "no_x2apic_optout", 16))
no_x2apic_optout = 1;
+   else if (!strncmp(str, "nopost", 6))
+   disable_irq_post = 1;
 
str += strcspn(str, ",");
while (*str == ',')
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted

2014-12-12 Thread Feng Wu
This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu 
---
 arch/x86/kvm/vmx.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ee3b735..bf2e6cd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1916,10 +1916,54 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int 
cpu)
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
vmx->loaded_vmcs->cpu = cpu;
}
+
+   if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+   struct pi_desc old, new;
+   unsigned int dest;
+
+   memset(&old, 0, sizeof(old));
+   memset(&new, 0, sizeof(new));
+
+   do {
+   old.control = new.control = pi_desc->control;
+   if (vcpu->cpu != cpu) {
+   dest = cpu_physical_id(cpu);
+
+   if (x2apic_enabled())
+   new.ndst = dest;
+   else
+   new.ndst = (dest << 8) & 0xFF00;
+   }
+
+   pi_clear_sn(&new);
+
+   /* set 'NV' to 'notification vector' */
+   new.nv = POSTED_INTR_VECTOR;
+   } while (cmpxchg(&pi_desc->control, old.control,
+   new.control) != old.control);
+   }
 }
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+   struct pi_desc old, new;
+
+   memset(&old, 0, sizeof(old));
+   memset(&new, 0, sizeof(new));
+
+   /* Set SN when the vCPU is preempted */
+   if (vcpu->preempted) {
+   do {
+   old.control = new.control = pi_desc->control;
+   pi_set_sn(&new);
+   } while (cmpxchg(&pi_desc->control, old.control,
+   new.control) != old.control);
+   }
+   }
+
__vmx_load_host_state(to_vmx(vcpu));
if (!vmm_exclusive) {
__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-12 Thread Feng Wu
Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

Signed-off-by: Feng Wu 
---
 arch/x86/kvm/vmx.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1c83a2..0aee151 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4401,15 +4401,22 @@ static int vmx_vm_has_apicv(struct kvm *kvm)
 static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   int r;
+   int r, sn;
 
if (pi_test_and_set_pir(vector, &vmx->pi_desc))
return;
 
+   /*
+* Currently, we don't support urgent interrupt, all interrupts
+* are recognized as non-urgent interrupt, so we cannot send
+* posted-interrupt when 'SN' is set.
+*/
+   sn = pi_test_sn(&vmx->pi_desc);
+
r = pi_test_and_set_on(&vmx->pi_desc);
kvm_make_request(KVM_REQ_EVENT, vcpu);
 #ifdef CONFIG_SMP
-   if (!r && (vcpu->mode == IN_GUEST_MODE))
+   if (!r && !sn && (vcpu->mode == IN_GUEST_MODE))
apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
POSTED_INTR_VECTOR);
else
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 22/26] KVM: Define a wakeup worker thread for vCPU

2014-12-12 Thread Feng Wu
Define a wakeup worker thread for a vCPU.

Signed-off-by: Feng Wu 
---
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ca9a393..3d7242c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -249,6 +249,7 @@ struct kvm_vcpu {
int sigset_active;
sigset_t sigset;
struct kvm_vcpu_stat stat;
+   struct work_struct wakeup_worker;
 
 #ifdef CONFIG_HAS_IOMEM
int mmio_needed;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25ffac9..ba53fd6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -211,6 +211,13 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
+static void wakeup_thread(struct work_struct *work)
+{
+   struct kvm_vcpu *vcpu = container_of(work, struct kvm_vcpu,
+   wakeup_worker);
+   kvm_vcpu_kick(vcpu);
+}
+
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
struct page *page;
@@ -224,6 +231,8 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, 
unsigned id)
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);
 
+   INIT_WORK(&vcpu->wakeup_worker, wakeup_thread);
+
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 18/26] KVM: kvm-vfio: User API for VT-d Posted-Interrupts

2014-12-12 Thread Feng Wu
This patch adds and documents two new attributes
KVM_DEV_VFIO_DEVICE_POST_IRQ and KVM_DEV_VFIO_DEVICE_UNPOST_IRQ
in KVM_DEV_VFIO_DEVICE group. The new attributes are used for
VT-d Posted-Interrupts.

When guest OS changes the interrupt configuration for an
assigned device, such as, MSI/MSIx data/address fields,
QEMU will use this IRQ attribute to tell KVM to update the
related IRTE according the VT-d Posted-Interrrupts Specification,
such as, the guest vector should be updated in the related IRTE.

Signed-off-by: Feng Wu 
---
 Documentation/virtual/kvm/devices/vfio.txt |  9 +
 include/uapi/linux/kvm.h   | 11 +++
 2 files changed, 20 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
index f7aff29..ecfbf61 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -42,3 +42,12 @@ activated before VFIO_DEVICE_SET_IRQS has been called to 
trigger the IRQ
 or associate an eventfd to it. Unforwarding can only be called while the
 signaling has been disabled with VFIO_DEVICE_SET_IRQS. If this condition is
 not satisfied, the command returns an -EBUSY.
+
+  KVM_DEV_VFIO_DEVICE_POST_IRQ: set a VFIO device IRQ as posted
+  KVM_DEV_VFIO_DEVICE_UNPOST_IRQ: set a VFIO device IRQ as remapped
+For this attribute, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
+
+When guest OS changes the interrupt configuration for an assigned device,
+such as, MSI/MSIx data/address fields, QEMU will use this IRQ attribute
+to tell KVM to update the related IRTE according the VT-d Posted-Interrrupts
+Specification, such as, the guest vector should be updated in the related IRTE.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a269a42..8f51487 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -949,6 +949,8 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_DEVICE   2
 #define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ  1
 #define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ2
+#define   KVM_DEV_VFIO_DEVICE_POST_IRQ 3
+#define   KVM_DEV_VFIO_DEVICE_UNPOST_IRQ   4
 
 enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20= 1,
@@ -973,6 +975,15 @@ struct kvm_arch_forwarded_irq {
__u32 gsi; /* gsi, ie. virtual IRQ number */
 };
 
+struct kvm_vfio_dev_irq {
+   __u32   argsz;
+   __u32   fd; /* file descriptor of the VFIO device */
+   __u32   index;  /* VFIO device IRQ index */
+   __u32   start;
+   __u32   count;
+   __u32   gsi[];  /* gsi, ie. virtual IRQ number */
+};
+
 /*
  * ioctls for VM fds
  */
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 19/26] KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts

2014-12-12 Thread Feng Wu
This patch adds the kvm-vfio interface for VT-d Posted-Interrrupts.
When guests update MSI/MSI-x information for an assigned-device,
QEMU will use KVM_DEV_VFIO_DEVICE_POST_IRQ attribute to setup
IRTE for VT-d PI. Userspace program can also use
KVM_DEV_VFIO_DEVICE_UNPOST_IRQ to change back to irq remapping mode.
This patch implements these IRQ attributes.

Signed-off-by: Feng Wu 
---
 include/linux/kvm_host.h |  20 +
 virt/kvm/vfio.c  | 107 +++
 2 files changed, 127 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5cd4420..ca9a393 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1134,6 +1134,26 @@ static inline int kvm_arch_vfio_set_forward(struct 
kvm_fwd_irq *fwd_irq,
 }
 #endif
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
+/*
+ * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+uint32_t guest_irq, bool set);
+#else
+static int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+   uint32_t guest_irq, bool set)
+{
+   return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 6bc7001..dbc6c3b 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -446,6 +446,99 @@ out:
return ret;
 }
 
+static int kvm_vfio_pci_get_irq_count(struct pci_dev *pdev, int irq_type)
+{
+   if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
+   u8 pin;
+
+   pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
+   if (pin)
+   return 1;
+   } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX)
+   return pci_msi_vec_count(pdev);
+   else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX)
+   return pci_msix_vec_count(pdev);
+
+   return 0;
+}
+
+static int kvm_vfio_control_pi(struct kvm_device *kdev,
+  int32_t __user *argp, bool set)
+{
+   struct kvm_vfio_dev_irq pi_info;
+   uint32_t *gsi;
+   unsigned long minsz;
+   struct vfio_device *vdev;
+   struct msi_desc *entry;
+   struct device *dev;
+   struct pci_dev *pdev;
+   int i, max, ret;
+
+   minsz = offsetofend(struct kvm_vfio_dev_irq, count);
+
+   if (copy_from_user(&pi_info, (void __user *)argp, minsz))
+   return -EFAULT;
+
+   if (pi_info.argsz < minsz || pi_info.index >= VFIO_PCI_NUM_IRQS)
+   return -EINVAL;
+
+   vdev = kvm_vfio_get_vfio_device(pi_info.fd);
+   if (IS_ERR(vdev))
+   return PTR_ERR(vdev);
+
+   dev = kvm_vfio_external_base_device(vdev);
+   if (!dev || !dev_is_pci(dev)) {
+   ret = -EFAULT;
+   goto put_vfio_device;
+   }
+
+   pdev = to_pci_dev(dev);
+
+   max = kvm_vfio_pci_get_irq_count(pdev, pi_info.index);
+   if (max <= 0) {
+   ret = -EFAULT;
+   goto put_vfio_device;
+   }
+
+   if (pi_info.argsz - minsz < pi_info.count * sizeof(u32) ||
+   pi_info.start >= max || pi_info.start + pi_info.count > max) {
+   ret = -EINVAL;
+   goto put_vfio_device;
+   }
+
+   gsi = memdup_user((void __user *)((unsigned long)argp + minsz),
+  pi_info.count * sizeof(u32));
+   if (IS_ERR(gsi)) {
+   ret = PTR_ERR(gsi);
+   goto put_vfio_device;
+   }
+
+#ifdef CONFIG_PCI_MSI
+   for (i = 0; i < pi_info.count; i++) {
+   list_for_each_entry(entry, &pdev->msi_list, list) {
+   if (entry->msi_attrib.entry_nr != pi_info.start+i)
+   continue;
+
+   ret = kvm_arch_vfio_update_pi_irte(kdev->kvm,
+  entry->irq,
+  gsi[i],
+  set);
+   if (ret)
+   goto free_gsi;
+   }
+   }
+#endif
+
+   ret = 0;
+
+free_gsi:
+   kfree(gsi);
+
+put_vfio_device:
+   kvm_vfio_put_vfio_device(vdev);
+   return ret;
+}
+
 static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
 {
int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
@@ -456,6 +549,14 @@ static int kvm_vfio_set_device(struct kvm_device *kdev, 
long attr, u64 arg)
case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
break;
+#ifd

[v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-12 Thread Feng Wu
We don't need to migrate the irqs for VT-d Posted-Interrupts here.
When 'pst' is set in IRTE, the associated irq will be posted to
guests instead of interrupt remapping. The destination of the
interrupt is set in Posted-Interrupts Descriptor, and the migration
happens during vCPU scheduling.

However, we still update the cached irte here, which can be used
when changing back to remapping mode.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 drivers/iommu/intel_irq_remapping.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 48c2051..ab9057a 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -977,6 +977,7 @@ intel_ir_set_affinity(struct irq_data *data, const struct 
cpumask *mask,
 {
struct intel_ir_data *ir_data = data->chip_data;
struct irte *irte = &ir_data->irte_entry;
+   struct irte_pi *irte_pi = (struct irte_pi *)irte;
struct irq_cfg *cfg = irqd_cfg(data);
struct irq_data *parent = data->parent_data;
int ret;
@@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data, const struct 
cpumask *mask,
 */
irte->vector = cfg->vector;
irte->dest_id = IRTE_DEST(cfg->dest_apicid);
-   modify_irte(&ir_data->irq_2_iommu, irte);
+
+   /* We don't need to modify irte if the interrupt is for posting. */
+   if (irte_pi->pst != 1)
+   modify_irte(&ir_data->irq_2_iommu, irte);
 
/*
 * After this point, all the interrupts will start arriving
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts

2014-12-12 Thread Feng Wu
Currently, we use a global vector as the Posted-Interrupts
Notification Event for all the vCPUs in the system. We need
to introduce another global vector for VT-d Posted-Interrtups,
which will be used to wakeup the sleep vCPU when an external
interrupt from a direct-assigned device happens for that vCPU.

Signed-off-by: Feng Wu 
---
 arch/x86/include/asm/entry_arch.h  |  2 ++
 arch/x86/include/asm/hardirq.h |  1 +
 arch/x86/include/asm/hw_irq.h  |  2 ++
 arch/x86/include/asm/irq_vectors.h |  1 +
 arch/x86/kernel/entry_64.S |  2 ++
 arch/x86/kernel/irq.c  | 27 +++
 arch/x86/kernel/irqinit.c  |  2 ++
 7 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/entry_arch.h 
b/arch/x86/include/asm/entry_arch.h
index dc5fa66..27ca0af 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -23,6 +23,8 @@ BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
 #ifdef CONFIG_HAVE_KVM
 BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR,
 smp_kvm_posted_intr_ipi)
+BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR,
+smp_kvm_posted_intr_wakeup_ipi)
 #endif
 
 /*
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 0f5fb6b..9866065 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -14,6 +14,7 @@ typedef struct {
 #endif
 #ifdef CONFIG_HAVE_KVM
unsigned int kvm_posted_intr_ipis;
+   unsigned int kvm_posted_intr_wakeup_ipis;
 #endif
unsigned int x86_platform_ipis; /* arch dependent */
unsigned int apic_perf_irqs;
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e7ae6eb..38fac9b 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -29,6 +29,7 @@
 extern asmlinkage void apic_timer_interrupt(void);
 extern asmlinkage void x86_platform_ipi(void);
 extern asmlinkage void kvm_posted_intr_ipi(void);
+extern asmlinkage void kvm_posted_intr_wakeup_ipi(void);
 extern asmlinkage void error_interrupt(void);
 extern asmlinkage void irq_work_interrupt(void);
 
@@ -92,6 +93,7 @@ extern void trace_call_function_single_interrupt(void);
 #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
 #define trace_reboot_interrupt  reboot_interrupt
 #define trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
+#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
 #endif /* CONFIG_TRACING */
 
 struct irq_domain;
diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index b26cb12..dca94f2 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -105,6 +105,7 @@
 /* Vector for KVM to deliver posted interrupt IPI */
 #ifdef CONFIG_HAVE_KVM
 #define POSTED_INTR_VECTOR 0xf2
+#define POSTED_INTR_WAKEUP_VECTOR  0xf1
 #endif
 
 /*
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e61c14a..a598447 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -960,6 +960,8 @@ apicinterrupt X86_PLATFORM_IPI_VECTOR \
 #ifdef CONFIG_HAVE_KVM
 apicinterrupt3 POSTED_INTR_VECTOR \
kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
+apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
+   kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
 #endif
 
 #ifdef CONFIG_X86_MCE_THRESHOLD
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 922d285..47408c3 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_HAVE_KVM
+void (*wakeup_handler_callback)(void) = NULL;
+EXPORT_SYMBOL_GPL(wakeup_handler_callback);
+
 /*
  * Handler for POSTED_INTERRUPT_VECTOR.
  */
@@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct pt_regs 
*regs)
 
set_irq_regs(old_regs);
 }
+
+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs)
+{
+   struct pt_regs *old_regs = set_irq_regs(regs);
+
+   ack_APIC_irq();
+
+   irq_enter();
+
+   exit_idle();
+
+   inc_irq_stat(kvm_posted_intr_wakeup_ipis);
+
+   if (wakeup_handler_callback)
+   wakeup_handler_callback();
+
+   irq_exit();
+
+   set_irq_regs(old_regs);
+}
+
 #endif
 
 __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs)
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 70e181e..844673c 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -144,6 +144,8 @@ static void __init apic_intr_init(void)
 #ifdef CONFIG_HAVE_KVM
/* IPI for KVM to deliver posted interrupt */
alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
+   /* IPI for KVM to deliver interrupt to wake up tasks */
+   alloc_intr_gate(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi);
 

[v3 11/26] KVM: Add some helper functions for Posted-Interrupts

2014-12-12 Thread Feng Wu
This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu 
---
 arch/x86/kvm/vmx.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index abdb84f..0b1383e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -408,6 +408,8 @@ struct nested_vmx {
 };
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
+
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
@@ -443,6 +445,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc 
*pi_desc)
return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
 }
 
+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+   return clear_bit(POSTED_INTR_SN,
+   (unsigned long *)&pi_desc->control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+   return set_bit(POSTED_INTR_SN,
+   (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+   return test_bit(POSTED_INTR_ON,
+   (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+   return test_bit(POSTED_INTR_SN,
+   (unsigned long *)&pi_desc->control);
+}
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip

2014-12-12 Thread Feng Wu
Implement irq_set_vcpu_affinity for intel_ir_chip.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 arch/x86/include/asm/irq_remapping.h |  5 +
 drivers/iommu/intel_irq_remapping.c  | 35 +++
 2 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index f67ae08..f87ac70 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -60,6 +60,11 @@ static inline struct irq_domain 
*arch_get_ir_parent_domain(void)
return x86_vector_domain;
 }
 
+struct vcpu_data {
+   u64 pi_desc_addr;   /* Physical address of PI Descriptor */
+   u32 vector; /* Guest vector of the interrupt */
+};
+
 #else  /* CONFIG_IRQ_REMAP */
 
 static inline void setup_irq_remapping_ops(void) { }
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index f6da3b2..48c2051 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -42,6 +42,7 @@ struct irq_2_iommu {
 struct intel_ir_data {
struct irq_2_iommu  irq_2_iommu;
struct irte irte_entry;
+   struct irte_pi  irte_pi_entry;
union {
struct msi_msg  msi_entry;
};
@@ -1010,10 +1011,44 @@ static void intel_ir_compose_msi_msg(struct irq_data 
*irq_data,
*msg = ir_data->msi_entry;
 }
 
+static int intel_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
+{
+   struct intel_ir_data *ir_data = data->chip_data;
+   struct irte_pi *irte_pi = &ir_data->irte_pi_entry;
+   struct vcpu_data *vcpu_pi_info;
+
+   /* stop posting interrupts, back to remapping mode */
+   if (!vcpu_info)
+   modify_irte(&ir_data->irq_2_iommu, &ir_data->irte_entry);
+   else {
+   vcpu_pi_info = (struct vcpu_data *)vcpu_info;
+   memcpy(irte_pi, &ir_data->irte_entry, sizeof(struct irte));
+
+   irte_pi->urg = 0;
+   irte_pi->vector = vcpu_pi_info->vector;
+   irte_pi->pda_l = (vcpu_pi_info->pi_desc_addr >>
+(32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
+   irte_pi->pda_h = (vcpu_pi_info->pi_desc_addr >> 32) &
+~(-1UL << PDA_HIGH_BIT);
+
+   irte_pi->__reserved_1 = 0;
+   irte_pi->__reserved_2 = 0;
+   irte_pi->__reserved_3 = 0;
+   irte_pi->__reserved_4 = 0;
+
+   irte_pi->pst = 1;
+
+   modify_irte(&ir_data->irq_2_iommu, (struct irte *)irte_pi);
+   }
+
+   return 0;
+}
+
 static struct irq_chip intel_ir_chip = {
.irq_ack = ir_ack_apic_edge,
.irq_set_affinity = intel_ir_set_affinity,
.irq_compose_msi_msg = intel_ir_compose_msi_msg,
+   .irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity,
 };
 
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 10/26] KVM: change struct pi_desc for VT-d Posted-Interrupts

2014-12-12 Thread Feng Wu
Change struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu 
---
 arch/x86/kvm/vmx.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3e556c6..abdb84f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -411,8 +411,19 @@ struct nested_vmx {
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
-   u32 control;/* bit 0 of control is outstanding notification bit */
-   u32 rsvd[7];
+   union {
+   struct {
+   u64 on  : 1,
+   sn  : 1,
+   rsvd_1  : 13,
+   ndm : 1,
+   nv  : 8,
+   rsvd_2  : 8,
+   ndst: 32;
+   };
+   u64 control;
+   };
+   u32 rsvd[6];
 } __aligned(64);
 
 static bool pi_test_and_set_on(struct pi_desc *pi_desc)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 09/26] iommu, x86: define irq_remapping_cap()

2014-12-12 Thread Feng Wu
This patch adds a new interface irq_remapping_cap() to detect
whether irq remapping supports new features, such as VT-d
Posted-Interrupts. We export this function out, so that KVM
code can check this and use this mechanism properly.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 arch/x86/include/asm/irq_remapping.h |  2 ++
 drivers/iommu/irq_remapping.c| 12 
 2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index f87ac70..b3ad067 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -37,6 +37,7 @@ enum irq_remap_cap {
 
 extern void setup_irq_remapping_ops(void);
 extern int irq_remapping_supported(void);
+extern bool irq_remapping_cap(enum irq_remap_cap cap);
 extern void set_irq_remapping_broken(void);
 extern int irq_remapping_prepare(void);
 extern int irq_remapping_enable(void);
@@ -69,6 +70,7 @@ struct vcpu_data {
 
 static inline void setup_irq_remapping_ops(void) { }
 static inline int irq_remapping_supported(void) { return 0; }
+static bool irq_remapping_cap(enum irq_remap_cap cap) { return 0; }
 static inline void set_irq_remapping_broken(void) { }
 static inline int irq_remapping_prepare(void) { return -ENODEV; }
 static inline int irq_remapping_enable(void) { return -ENODEV; }
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index e63e969..b008663 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -103,6 +103,18 @@ int irq_remapping_supported(void)
return remap_ops->supported();
 }
 
+bool irq_remapping_cap(enum irq_remap_cap cap)
+{
+   if (disable_irq_post)
+   return 0;
+
+   if (!remap_ops || !remap_ops->capability)
+   return 0;
+
+   return remap_ops->capability(cap);
+}
+EXPORT_SYMBOL_GPL(irq_remapping_cap);
+
 int __init irq_remapping_prepare(void)
 {
if (!remap_ops || !remap_ops->prepare)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts

2014-12-12 Thread Feng Wu
Add a new irte_pi structure for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 include/linux/dmar.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 8473756..c7f9cda 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -212,6 +212,38 @@ struct irte {
};
 };
 
+struct irte_pi {
+   union {
+   struct {
+   __u64   present : 1,
+   fpd : 1,
+   __reserved_1: 6,
+   avail   : 4,
+   __reserved_2: 2,
+   urg : 1,
+   pst : 1,
+   vector  : 8,
+   __reserved_3: 14,
+   pda_l   : 26;
+   };
+   __u64 low;
+   };
+
+   union {
+   struct {
+   __u64   sid : 16,
+   sq  : 2,
+   svt : 2,
+   __reserved_4: 12,
+   pda_h   : 32;
+   };
+   __u64 high;
+   };
+};
+
+#define PDA_LOW_BIT26
+#define PDA_HIGH_BIT   32
+
 enum {
IRQ_REMAP_XAPIC_MODE,
IRQ_REMAP_X2APIC_MODE,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor

2014-12-12 Thread Feng Wu
This patch initializes the VT-d Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu 
---
 arch/x86/kvm/vmx.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b1383e..66ca275 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "trace.h"
 
@@ -4433,6 +4434,30 @@ static void ept_set_mmio_spte_mask(void)
kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
 }
 
+static void pi_desc_init(struct vcpu_vmx *vmx)
+{
+   unsigned int dest;
+
+   if (!irq_remapping_cap(IRQ_POSTING_CAP))
+   return;
+
+   /*
+* Initialize Posted-Interrupt Descriptor
+*/
+
+   pi_clear_sn(&vmx->pi_desc);
+   vmx->pi_desc.nv = POSTED_INTR_VECTOR;
+
+   /* Physical mode for Notificaiton Event */
+   vmx->pi_desc.ndm = 0;
+   dest = cpu_physical_id(vmx->vcpu.cpu);
+
+   if (x2apic_enabled())
+   vmx->pi_desc.ndst = dest;
+   else
+   vmx->pi_desc.ndst = (dest << 8) & 0xFF00;
+}
+
 /*
  * Sets up the vmcs for emulated real mode.
  */
@@ -4476,6 +4501,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
+
+   pi_desc_init(vmx);
}
 
if (ple_gap) {
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability

2014-12-12 Thread Feng Wu
Add helper function to detect VT-d Posted-Interrupts capability.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 include/linux/intel-iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ecaf3a9..8174ae8 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -87,6 +87,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
 /*
  * Decoding Capability Register
  */
+#define cap_pi_support(c)  (((c) >> 59) & 1)
 #define cap_read_drain(c)  (((c) >> 55) & 1)
 #define cap_write_drain(c) (((c) >> 54) & 1)
 #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 17/26] KVM: make kvm_set_msi_irq() public

2014-12-12 Thread Feng Wu
Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu 
---
 include/linux/kvm_host.h | 2 ++
 virt/kvm/irq_comm.c  | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cfa85ac..5cd4420 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -785,6 +785,8 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
 int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+struct kvm_lapic_irq *irq);
 
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index f3c5d69..231671a 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -106,7 +106,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
return r;
 }
 
-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
   struct kvm_lapic_irq *irq)
 {
trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 20/26] KVM: x86: kvm-vfio: VT-d posted-interrupts setup

2014-12-12 Thread Feng Wu
This patch defines macro __KVM_HAVE_ARCH_KVM_VFIO_POST and
implement kvm_arch_vfio_update_pi_irte for x86 architecture.

Signed-off-by: Feng Wu 
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/Makefile   |  2 +-
 arch/x86/kvm/kvm_vfio_x86.c | 77 +
 3 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/kvm_vfio_x86.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cd4b174..13e3e40 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -82,6 +82,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, 
int level)
(base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
 }
 
+#define __KVM_HAVE_ARCH_KVM_VFIO_POST
+
 #define SELECTOR_TI_MASK (1 << 2)
 #define SELECTOR_RPL_MASK 0x03
 
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 25d22b2..8809d58 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)   += 
$(KVM)/assigned-dev.o $(KVM)/iommu.o
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-  i8254.o cpuid.o pmu.o
+  i8254.o cpuid.o pmu.o kvm_vfio_x86.o
 kvm-intel-y+= vmx.o
 kvm-amd-y  += svm.o
 
diff --git a/arch/x86/kvm/kvm_vfio_x86.c b/arch/x86/kvm/kvm_vfio_x86.c
new file mode 100644
index 000..2ba618e
--- /dev/null
+++ b/arch/x86/kvm/kvm_vfio_x86.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright (C) 2014 Intel Corporation.
+ * Authors: Feng Wu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+/*
+ * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+uint32_t guest_irq, bool set)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   struct kvm_irq_routing_table *irq_rt;
+   struct kvm_lapic_irq irq;
+   struct kvm_vcpu *vcpu;
+   struct vcpu_data vcpu_info;
+   int idx, ret = -EINVAL;
+
+   idx = srcu_read_lock(&kvm->irq_srcu);
+   irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+   BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
+
+   hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+   if (e->type != KVM_IRQ_ROUTING_MSI)
+   continue;
+   /*
+* VT-d PI cannot support posting multicast/broadcast
+* interrupts to a VCPU, we still use interrupt remapping
+* for these kind of interrupts.
+*/
+
+   kvm_set_msi_irq(e, &irq);
+   if (!kvm_find_dest_vcpu(kvm, &irq, &vcpu))
+   continue;
+
+   vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
+   vcpu_info.vector = irq.vector;
+
+   if (set)
+   ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
+   else {
+   /* suppress notification event before unposting */
+   kvm_x86_ops->pi_set_sn(vcpu);
+   ret = irq_set_vcpu_affinity(host_irq, NULL);
+   kvm_x86_ops->pi_clear_sn(vcpu);
+   }
+
+   if (ret < 0) {
+   printk(KERN_INFO "%s: failed to update PI IRTE\n",
+   __func__);
+   goto out;
+   }
+   }
+
+   ret = 0;
+out:
+   srcu_read_unlock(&kvm->irq_srcu, idx);
+   return ret;
+}
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 16/26] KVM: Make struct kvm_irq_routing_table accessible

2014-12-12 Thread Feng Wu
Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu 
---
 include/linux/kvm_host.h | 19 +++
 virt/kvm/irqchip.c   | 11 ---
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b9659d..cfa85ac 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
struct hlist_node link;
 };
 
+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+   int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+   struct kvm_kernel_irq_routing_entry *rt_entries;
+   u32 nr_rt_entries;
+   /*
+* Array indexed by gsi. Each entry contains list of irq chips
+* the gsi is connected to.
+*/
+   struct hlist_head map[0];
+};
+
+#else
+
+struct kvm_irq_routing_table {};
+
+#endif
+
 #ifndef KVM_PRIVATE_MEM_SLOTS
 #define KVM_PRIVATE_MEM_SLOTS 0
 #endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 7f256f3..cdf29a6 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,17 +31,6 @@
 #include 
 #include "irq.h"
 
-struct kvm_irq_routing_table {
-   int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
-   struct kvm_kernel_irq_routing_entry *rt_entries;
-   u32 nr_rt_entries;
-   /*
-* Array indexed by gsi. Each entry contains list of irq chips
-* the gsi is connected to.
-*/
-   struct hlist_head map[0];
-};
-
 int kvm_irq_map_gsi(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *entries, int gsi)
 {
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 15/26] KVM: add interfaces to control PI outside vmx

2014-12-12 Thread Feng Wu
This patch adds pi_clear_sn and pi_set_sn to struct kvm_x86_ops,
so we can set/clear SN outside vmx.

Signed-off-by: Feng Wu 
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/vmx.c  | 13 +
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b45b78..cd4b174 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -773,6 +773,9 @@ struct kvm_x86_ops {
 
void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
+
+   void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
+   void (*pi_set_sn)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 81f239b..ee3b735 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -567,6 +567,16 @@ struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
return &(to_vmx(vcpu)->pi_desc);
 }
 
+static void vmx_pi_clear_sn(struct kvm_vcpu *vcpu)
+{
+   pi_clear_sn(vcpu_to_pi_desc(vcpu));
+}
+
+static void vmx_pi_set_sn(struct kvm_vcpu *vcpu)
+{
+   pi_set_sn(vcpu_to_pi_desc(vcpu));
+}
+
 #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
 #define FIELD(number, name)[number] = VMCS12_OFFSET(name)
 #define FIELD64(number, name)  [number] = VMCS12_OFFSET(name), \
@@ -9256,6 +9266,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
.sched_in = vmx_sched_in,
 
.get_pi_desc_addr = vmx_get_pi_desc_addr,
+
+   .pi_clear_sn = vmx_pi_clear_sn,
+   .pi_set_sn = vmx_pi_set_sn,
 };
 
 static int __init vmx_init(void)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 05/26] x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller

2014-12-12 Thread Feng Wu
Implement irq_set_vcpu_affinity for pci_msi_ir_controller.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 arch/x86/kernel/apic/msi.c | 1 +
 include/linux/irq.h| 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index da163da..b0ed073 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -152,6 +152,7 @@ static struct irq_chip pci_msi_ir_controller = {
.irq_mask   = pci_msi_mask_irq,
.irq_ack= irq_chip_ack_parent,
.irq_retrigger  = irq_chip_retrigger_hierarchy,
+   .irq_set_vcpu_affinity  = irq_chip_set_vcpu_affinity_parent,
.flags  = IRQCHIP_SKIP_SET_WAKE,
 };
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 83abafc..5dcaa7f 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -464,6 +464,9 @@ extern void irq_chip_eoi_parent(struct irq_data *data);
 extern int irq_chip_set_affinity_parent(struct irq_data *data,
const struct cpumask *dest,
bool force);
+extern int irq_chip_set_vcpu_affinity_parent(struct irq_data *data,
+void *vcpu_info);
+
 #endif
 
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-12 Thread Feng Wu
This patch defines a new interface kvm_find_dest_vcpu for
VT-d PI, which can returns the destination vCPU of the
interrupt for guests.

Since VT-d PI cannot handle broadcast/multicast interrupt,
Here we only handle Fixed and Lowest priority interrupts.

The current method of handling guest lowest priority interrtups
is to use a counter 'apic_arb_prio' for each vCPU, we choose the
vCPU with smallest 'apic_arb_prio' and then increase it by 1.
However, for VT-d PI, we cannot re-use this, since we no longer
have control to 'apic_arb_prio' with posted interrupt direct
delivery by Hardware.

Here, we introduce a similar way with 'apic_arb_prio' to handle
guest lowest priority interrtups when VT-d PI is used. Here is the
ideas:
- Each vCPU has a counter 'round_robin_counter'.
- When guests sets an interrupts to lowest priority, we choose
the vCPU with smallest 'round_robin_counter' as the destination,
then increase it.

Signed-off-by: Feng Wu 
---
 arch/x86/include/asm/kvm_host.h |  4 
 virt/kvm/irq_comm.c | 41 +
 2 files changed, 45 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ed0c30..7a41808 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -358,6 +358,7 @@ struct kvm_vcpu_arch {
struct kvm_lapic *apic;/* kernel irqchip context */
unsigned long apic_attention;
int32_t apic_arb_prio;
+   int32_t round_robin_counter;
int mp_state;
u64 ia32_misc_enable_msr;
bool tpr_access_reporting;
@@ -1093,4 +1094,7 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, 
u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
 
+bool kvm_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+   struct kvm_vcpu **dest_vcpu);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 963b899..f3c5d69 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -317,6 +317,47 @@ out:
return r;
 }
 
+int kvm_compare_rr_counter(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
+{
+   return vcpu1->arch.round_robin_counter -
+   vcpu2->arch.round_robin_counter;
+}
+
+bool kvm_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+   struct kvm_vcpu **dest_vcpu)
+{
+   int i, r = 0;
+   struct kvm_vcpu *vcpu, *dest = NULL;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (!kvm_apic_present(vcpu))
+   continue;
+
+   if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+   irq->dest_id, irq->dest_mode))
+   continue;
+
+   if (!kvm_is_dm_lowest_prio(irq)) {
+   r++;
+   *dest_vcpu = vcpu;
+   } else if (kvm_lapic_enabled(vcpu)) {
+   if (!dest)
+   dest = vcpu;
+   else if (kvm_compare_rr_counter(vcpu, dest) < 0)
+   dest = vcpu;
+   }
+   }
+
+   if (dest) {
+   dest->arch.round_robin_counter++;
+   *dest_vcpu = dest;
+   return true;
+   } else if (r == 1)
+   return true;
+
+   return false;
+}
+
 #define IOAPIC_ROUTING_ENTRY(irq) \
{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,  \
  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel

2014-12-12 Thread Feng Wu
Add the Intel side implementation for capability in
struct irq_remap_ops.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 drivers/iommu/intel_irq_remapping.c | 27 +++
 drivers/iommu/irq_remapping.c   |  2 ++
 drivers/iommu/irq_remapping.h   |  4 
 3 files changed, 33 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index ab9057a..08a7c39 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -652,6 +652,32 @@ error:
return -1;
 }
 
+static bool intel_irq_remapping_capability(enum irq_remap_cap cap)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+
+   switch (cap) {
+   case IRQ_POSTING_CAP:
+   /*
+* If 1) posted-interrupts is disabled by user
+* or 2) irq remapping is disabled, posted-interrupts
+* is not supported.
+*/
+   if (disable_irq_post || !irq_remapping_enabled)
+   return 0;
+
+   for_each_iommu(iommu, drhd)
+   if (!cap_pi_support(iommu->cap))
+   return 0;
+
+   return 1;
+   default:
+   pr_warn("Unknown irq remapping capability.\n");
+   return 0;
+   }
+}
+
 static int ir_parse_one_hpet_scope(struct acpi_dmar_device_scope *scope,
   struct intel_iommu *iommu,
   struct acpi_dmar_hardware_unit *drhd)
@@ -948,6 +974,7 @@ static struct irq_domain *intel_get_irq_domain(struct 
irq_alloc_info *info)
 
 struct irq_remap_ops intel_irq_remap_ops = {
.supported  = intel_irq_remapping_supported,
+   .capability = intel_irq_remapping_capability,
.prepare= dmar_table_init,
.enable = intel_enable_irq_remapping,
.disable= disable_irq_remapping,
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 3c3da04d..e63e969 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -24,6 +24,8 @@ int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
+int disable_irq_post = 1;
+
 static struct irq_remap_ops *remap_ops;
 
 static void irq_remapping_disable_io_apic(void)
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 2d991b2..cb1f46d 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -36,6 +36,8 @@ extern int disable_sourceid_checking;
 extern int no_x2apic_optout;
 extern int irq_remapping_enabled;
 
+extern int disable_irq_post;
+
 struct irq_remap_ops {
/* Check whether Interrupt Remapping is supported */
int (*supported)(void);
@@ -76,6 +78,8 @@ extern void ir_ack_apic_edge(struct irq_data *data);
 #define disable_irq_remap 1
 #define irq_remap_broken  0
 
+#define disable_irq_post  1
+
 #endif /* CONFIG_IRQ_REMAP */
 
 #endif /* __IRQ_REMAPPING_H */
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 14/26] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu

2014-12-12 Thread Feng Wu
Define a interface to get PI descriptor address from the vCPU structure.

Signed-off-by: Feng Wu 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c  | 12 
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7a41808..9b45b78 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -772,6 +772,7 @@ struct kvm_x86_ops {
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
 
void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
+   u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 66ca275..81f239b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -562,6 +562,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu 
*vcpu)
return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+   return &(to_vmx(vcpu)->pi_desc);
+}
+
 #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
 #define FIELD(number, name)[number] = VMCS12_OFFSET(name)
 #define FIELD64(number, name)  [number] = VMCS12_OFFSET(name), \
@@ -4298,6 +4303,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu 
*vcpu)
return;
 }
 
+static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
+{
+   return __pa((u64)vcpu_to_pi_desc(vcpu));
+}
+
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -9244,6 +9254,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.check_nested_events = vmx_check_nested_events,
 
.sched_in = vmx_sched_in,
+
+   .get_pi_desc_addr = vmx_get_pi_desc_addr,
 };
 
 static int __init vmx_init(void)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 02/26] iommu: Add new member capability to struct irq_remap_ops

2014-12-12 Thread Feng Wu
This patch adds a new member capability to struct irq_remap_ops,
this new function ops can be used to check whether some
features are supported, such as VT-d Posted-Interrupts.

Signed-off-by: Feng Wu 
Reviewed-by: Jiang Liu 
---
 arch/x86/include/asm/irq_remapping.h | 4 
 drivers/iommu/irq_remapping.h| 4 
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index 6ba2431..f67ae08 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -31,6 +31,10 @@ struct irq_alloc_info;
 
 #ifdef CONFIG_IRQ_REMAP
 
+enum irq_remap_cap {
+   IRQ_POSTING_CAP = 0,
+};
+
 extern void setup_irq_remapping_ops(void);
 extern int irq_remapping_supported(void);
 extern void set_irq_remapping_broken(void);
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 4bd791d..2d991b2 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -28,6 +28,7 @@ struct irq_data;
 struct msi_msg;
 struct irq_domain;
 struct irq_alloc_info;
+enum irq_remap_cap;
 
 extern int disable_irq_remap;
 extern int irq_remap_broken;
@@ -39,6 +40,9 @@ struct irq_remap_ops {
/* Check whether Interrupt Remapping is supported */
int (*supported)(void);
 
+   /* Check some capability is supported */
+   bool (*capability)(enum irq_remap_cap);
+
/* Initializes hardware and makes it ready for remapping interrupts */
int  (*prepare)(void);
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 01/26] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU

2014-12-12 Thread Feng Wu
With Posted-Interrupts support in Intel CPU and IOMMU, an external
interrupt from assigned-devices could be directly delivered to a
virtual CPU in a virtual machine. Instead of hacking KVM and Intel
IOMMU drivers, we propose a platform independent interface to target
an interrupt to a specific virtual CPU in a virtual machine, or set
virtual CPU affinity for an interrupt.

By adopting this new interface and the hierarchy irqdomain, we could
easily support posted-interrupts on Intel platforms, and also provide
flexible enough interfaces for other platforms to support similar
features.

We may also cooperate between set_affinity() and set_vcpu_affinity()
in IRQ core or irq chip drivers.

Here is the usage scenario for this interface:
Guest update MSI/MSI-X interrupt configuration
-->QEMU and KVM handle this
-->KVM call this interface (passing posted interrupts descriptor
   and guest vector)
-->irq core will transfer the control to IOMMU
-->IOMMU will do the real work of updating IRTE (IRTE has new
   format for VT-d Posted-Interrupts)

Signed-off-by: Jiang Liu 
Signed-off-by: Feng Wu 
---
 include/linux/irq.h |  4 
 kernel/irq/chip.c   | 14 ++
 kernel/irq/manage.c | 20 
 3 files changed, 38 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index f26e736..83abafc 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -324,6 +324,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data 
*d)
  * irq_request_resources
  * @irq_compose_msi_msg:   optional to compose message content for MSI
  * @irq_write_msi_msg: optional to write message content for MSI
+ * @irq_set_vcpu_affinity: optional to target a virtual CPU in a virtual
+ * machine
  * @flags: chip specific flags
  */
 struct irq_chip {
@@ -362,6 +364,7 @@ struct irq_chip {
 
void(*irq_compose_msi_msg)(struct irq_data *data, struct 
msi_msg *msg);
void(*irq_write_msi_msg)(struct irq_data *data, struct 
msi_msg *msg);
+   int (*irq_set_vcpu_affinity)(struct irq_data *data, void 
*vcpu_info);
 
unsigned long   flags;
 };
@@ -416,6 +419,7 @@ extern void irq_cpu_online(void);
 extern void irq_cpu_offline(void);
 extern int irq_set_affinity_locked(struct irq_data *data,
   const struct cpumask *cpumask, bool force);
+extern int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_PENDING_IRQ)
 void irq_move_irq(struct irq_data *data);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6f1c7a5..fe0908f 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -948,6 +948,20 @@ int irq_chip_retrigger_hierarchy(struct irq_data *data)
 
return -ENOSYS;
 }
+
+/**
+ * irq_chip_set_vcpu_affinity_parent - Set vcpu affinity on the parent 
interrupt
+ * @data:  Pointer to interrupt specific data
+ * @dest:  The vcpu affinity information
+ */
+int irq_chip_set_vcpu_affinity_parent(struct irq_data *data, void *vcpu_info)
+{
+   data = data->parent_data;
+   if (data->chip->irq_set_vcpu_affinity)
+   return data->chip->irq_set_vcpu_affinity(data, vcpu_info);
+
+   return -ENOSYS;
+}
 #endif
 
 /**
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8069237..bd3a1ba 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -247,6 +247,26 @@ int irq_set_affinity_hint(unsigned int irq, const struct 
cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
+int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info)
+{
+   struct irq_desc *desc = irq_to_desc(irq);
+   struct irq_chip *chip;
+   unsigned long flags;
+   int ret = -ENOSYS;
+
+   if (!desc)
+   return -EINVAL;
+
+   raw_spin_lock_irqsave(&desc->lock, flags);
+   chip = desc->irq_data.chip;
+   if (chip && chip->irq_set_vcpu_affinity)
+   ret = chip->irq_set_vcpu_affinity(irq_desc_get_irq_data(desc),
+ vcpu_info);
+   raw_spin_unlock_irqrestore(&desc->lock, flags);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(irq_set_vcpu_affinity);
+
 static void irq_affinity_notify(struct work_struct *work)
 {
struct irq_affinity_notify *notify =
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v3 00/26] Add VT-d Posted-Interrupts support

2014-12-12 Thread Feng Wu
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

v1->v2:
* Use VFIO framework to enable this feature, the VFIO part of this series is
  base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
* Rebase this patchset on 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
  then revise some irq logic based on the new hierarchy irqdomain patches 
provided
  by Jiang Liu 

v2->v3:
* Adjust the Posted-interrupts Descriptor updating logic when vCPU is
  preempted or blocked.
* KVM_DEV_VFIO_DEVICE_POSTING_IRQ --> KVM_DEV_VFIO_DEVICE_POST_IRQ
* __KVM_HAVE_ARCH_KVM_VFIO_POSTING --> __KVM_HAVE_ARCH_KVM_VFIO_POST
* Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
  can be used to change back to remapping mode.
* Fix typo

This patch series is made of the following groups:
1-6: Some preparation changes in iommu and irq component, this is based on the
 new hierarchy irqdomain logic.
7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature detection,
  command line parameter.
10-17, 22-25: Changes related to KVM itself.
18-20: Changes in VFIO component, this part was previously sent out as
"[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d 
Posted-Interrupts"
21: x86 irq related changes

Feng Wu (26):
  genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
VCPU
  iommu: Add new member capability to struct irq_remap_ops
  iommu, x86: Define new irte structure for VT-d Posted-Interrupts
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
  iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Add intel_irq_remapping_capability() for Intel
  iommu, x86: define irq_remapping_cap()
  KVM: change struct pi_desc for VT-d Posted-Interrupts
  KVM: Add some helper functions for Posted-Interrupts
  KVM: Initialize VT-d Posted-Interrupts Descriptor
  KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
  KVM: add interfaces to control PI outside vmx
  KVM: Make struct kvm_irq_routing_table accessible
  KVM: make kvm_set_msi_irq() public
  KVM: kvm-vfio: User API for VT-d Posted-Interrupts
  KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
  KVM: x86: kvm-vfio: VT-d posted-interrupts setup
  x86, irq: Define a global vector for VT-d Posted-Interrupts
  KVM: Define a wakeup worker thread for vCPU
  KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  KVM: Suppress posted-interrupt when 'SN' is set
  iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

 Documentation/kernel-parameters.txt|   1 +
 Documentation/virtual/kvm/devices/vfio.txt |   9 ++
 arch/x86/include/asm/entry_arch.h  |   2 +
 arch/x86/include/asm/hardirq.h |   1 +
 arch/x86/include/asm/hw_irq.h  |   2 +
 arch/x86/include/asm/irq_remapping.h   |  11 ++
 arch/x86/include/asm/irq_vectors.h |   1 +
 arch/x86/include/asm/kvm_host.h|  12 ++
 arch/x86/kernel/apic/msi.c |   1 +
 arch/x86/kernel/entry_64.S |   2 +
 arch/x86/kernel/irq.c  |  27 
 arch/x86/kernel/irqinit.c  |   2 +
 arch/x86/kvm/Makefile  |   2 +-
 arch/x86/kvm/kvm_vfio_x86.c|  77 +
 arch/x86/kvm/vmx.c | 244 -
 arch/x86/kvm/x86.c |  22 ++-
 drivers/iommu/intel_irq_remapping.c|  68 +++-
 drivers/iommu/irq_remapping.c  |  24 ++-
 drivers/iommu/irq_remapping.h  |   8 +
 include/linux/dmar.h   |  32 
 include/linux/intel-iommu.h|   1 +
 include/linux/irq.h|   7 +
 include/linux/kvm_host.h   |  46 ++
 include/uapi/linux/kvm.h   |  11 ++
 kernel/irq/chip.c  |  14 ++
 kernel/irq/manage.c|  20 +++
 virt/kvm/irq_comm.c|  43 -
 virt/kvm/irqchip.c |  11 --
 virt/kvm/kvm_main.c|  15 ++
 virt/kvm/vfio.c| 107 +
 30 files changed, 795 insertions(+), 28 deletions(-)
 create mode 100644 arch/x86/kvm/kvm_vfio_x86.c

-- 
1.9.1

___
io

Re: [PATCH] memory: Add NVIDIA SMMU suspend/resume support

2014-12-12 Thread Alexandre Courbot
Hi Mark,

On Mon, Dec 8, 2014 at 3:20 PM, Mark Zhang  wrote:
> This patch adds suspend/resume support for NVIDIA SMMU.


> This patch is created on top of Thierry Reding's patch set:
>
> "[PATCH v7 00/12] NVIDIA Tegra memory controller and IOMMU support"

You should have this comment under the "---" as we don't need it to
persist once this patch is merged.

>
> Signed-off-by: Mark Zhang 
> ---
>  drivers/iommu/tegra-smmu.c | 79 
> +-
>  drivers/memory/tegra/mc.c  | 21 
>  drivers/memory/tegra/mc.h  |  4 +++
>  3 files changed, 103 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 0909e0bae9ec..ab38805055a4 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>
> +struct tegra_smmu_as;
> +
>  struct tegra_smmu {
> void __iomem *regs;
> struct device *dev;
> @@ -25,9 +27,10 @@ struct tegra_smmu {
> const struct tegra_smmu_soc *soc;
>
> unsigned long *asids;
> +   struct tegra_smmu_as **as;
> struct mutex lock;
>
> -   struct list_head list;
> +   struct list_head swgroup_asid_list;
>  };
>
>  struct tegra_smmu_as {
> @@ -40,6 +43,12 @@ struct tegra_smmu_as {
> u32 attr;
>  };
>
> +struct tegra_smmu_swgroup_asid {
> +   struct list_head list;
> +   unsigned swgroup_id;
> +   unsigned asid;
> +};
> +
>  static inline void smmu_writel(struct tegra_smmu *smmu, u32 value,
>unsigned long offset)
>  {
> @@ -376,6 +385,7 @@ static int tegra_smmu_as_prepare(struct tegra_smmu *smmu,
> as->smmu = smmu;
> as->use_count++;
>
> +   smmu->as[as->id] = as;
> return 0;
>  }
>
> @@ -386,6 +396,7 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu 
> *smmu,
> return;
>
> tegra_smmu_free_asid(smmu, as->id);
> +   smmu->as[as->id] = NULL;
> as->smmu = NULL;
>  }
>
> @@ -398,6 +409,7 @@ static int tegra_smmu_attach_dev(struct iommu_domain 
> *domain,
> struct of_phandle_args args;
> unsigned int index = 0;
> int err = 0;
> +   struct tegra_smmu_swgroup_asid *sa = NULL;

This initialization is unneeded. Actually this declaration would
probably be better placed in the while() loop below since its usage is
local to it.

>
> while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells", 
> index,
>&args)) {
> @@ -411,6 +423,14 @@ static int tegra_smmu_attach_dev(struct iommu_domain 
> *domain,
> return err;
>
> tegra_smmu_enable(smmu, swgroup, as->id);
> +
> +   sa = kzalloc(sizeof(struct tegra_smmu_swgroup_asid),
> +   GFP_KERNEL);
> +   INIT_LIST_HEAD(&sa->list);

You don't need to call INIT_LIST_HEAD on this, list_add_tail() will
effectively overwrite any initialization done by this macro (see
include/linux/list.h).

> +   sa->swgroup_id = swgroup;
> +   sa->asid = as->id;
> +   list_add_tail(&sa->list, &smmu->swgroup_asid_list);
> +
> index++;
> }
>
> @@ -427,6 +447,7 @@ static void tegra_smmu_detach_dev(struct iommu_domain 
> *domain, struct device *de
> struct tegra_smmu *smmu = as->smmu;
> struct of_phandle_args args;
> unsigned int index = 0;
> +   struct tegra_smmu_swgroup_asid *sa = NULL;

Same here, move the declaration into the while() loop and remove the
initialization.

>
> while (of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
>   &args)) {
> @@ -435,6 +456,13 @@ static void tegra_smmu_detach_dev(struct iommu_domain 
> *domain, struct device *de
> if (args.np != smmu->dev->of_node)
> continue;
>
> +   list_for_each_entry(sa, &smmu->swgroup_asid_list, list) {
> +   if (sa->asid == as->id && sa->swgroup_id == swgroup)
> +   break;
> +   }
> +   list_del(&sa->list);
> +   kfree(sa);
> +
> tegra_smmu_disable(smmu, swgroup, as->id);
> tegra_smmu_as_unprepare(smmu, as);
> index++;
> @@ -651,6 +679,48 @@ static void tegra_smmu_ahb_enable(void)
> }
>  }
>
> +void tegra_smmu_resume(struct tegra_smmu *smmu)
> +{
> +   struct tegra_smmu_as *as = NULL;
> +   unsigned int bit;
> +   u32 value;
> +   struct tegra_smmu_swgroup_asid *sa = NULL;

Again no need to initialize to NULL here.

> +
> +   for_each_set_bit(bit, smmu->asids, smmu->soc->num_asids) {
> +   as = smmu->as[bit];
> +   smmu->soc->ops->flush_dcache(as->pd, 0, SMMU_SIZE_PD);
> +
> +   smmu_writel(smmu, as->id & 0x7f, SMMU_PTB_ASID);
> +   val