Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

2013-04-03 Thread Takao Indoh
(2013/04/03 17:24), David Woodhouse wrote:
> On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
>> (2013/04/02 23:05), Joerg Roedel wrote:
>>> On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
 
 enable_IR
 intel_enable_irq_remapping
   iommu_disable_irq_remapping  <== IRES/QIES/TES disabled here
   dmar_disable_qi  <== do nothing
   dmar_enable_qi   <== QIES enabled
   intel_setup_irq_remapping<== IRES enabled
>>>
>>> But what we want to do here in the kdumo case is to disable translation
>>> too, right? Because the former kernel might have translation and
>>> irq-remapping enabled and the kdump kernel might be compiled without
>>> support for dma-remapping. So if we don't disable translation here too
>>> the kdump kernel is unable to do DMA.
>>
>> Yeah, you are right. I forgot such a case.
> 
> If you disable translation and there's some device still doing DMA, it's
> going to scribble over random areas of memory. You really want to have
> translation enabled and all the page tables *cleared*, during kexec. I
> think it's fair to insist that the secondary kernel should use the IOMMU
> if the first one did.
> 
>> To be honest, I also expected the side effect of this patch. As I wrote
>> in the previous mail, I'm working on kdump problem with iommu, that is,
>> ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails
>> due to this fault.
> 
> Here you've lost me. The DMAR fault is caught and reported, and how does
> this lead to a kdump failure? Are you using dodgy hardware that just
> keeps *trying* after an abort, and floods the system with a storm of
> DMAR faults? We've occasionally spoken about working around such a
> problem by setting a bit to make subsequent faults *silent*. Would that
> work?

There are several cases.
- DMAR fault messages floods and second kernel does not boot. Recently I
  saw similar report. https://lkml.org/lkml/2013/3/8/120
- igb driver detectes error on linkup and kdump via network fails.
- On a certain platform, though kdump itself works, PCIe error like
  Unexpected Completion is detected and it gets hardware degraded.

Thanks,
Takao Indoh


> 
>>   What we have to do is stopping DMA transaction
>> before DMA-remapping is disabled in 2nd kernel.
> 
> The IOMMU is there to stop DMA transactions. That is its *job*. :)
> 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-03 Thread Bjorn Helgaas
[+cc David and iommu list, Yinghai, Jiang]

On Mon, Mar 4, 2013 at 12:04 PM, Neil Horman  wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, 
> and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for 
> that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is 
> such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.
>
> Signed-off-by: Neil Horman 
> CC: Prarit Bhargava 
> CC: Don Zickus 
> CC: Don Dutile 
> CC: Bjorn Helgaas 
> CC: Asit Mallick 
> CC: linux-...@vger.kernel.org
>
> ---
>
> Change notes:
>
> v2)
>
> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
> chipset series is x86 only.  I decided however to keep the quirk as a regular
> quirk, not an early_quirk.  Early quirks have no way currently to determine if
> BIOS has properly disabled the feature in the iommu, at least not without
> significant hacking, and since its quite possible this will be a short lived
> quirk, should Don Z's workaround code prove successful (and it looks like it 
> may
> well), I don't think that necessecary.
>
> * Removed the WARNING banner from the quirk, and added the HW_ERR token to the
> string, I opted to leave the newlines in place however, as I really couldnt
> find a way to keep the text on a single line is still legible from a code
> perspective.  I think theres enough language in there that using cscope on 
> just
> about any substring however will turn it up, and again, this may be a short
> lived quirk.
> ---
>  arch/x86/kernel/quirks.c | 18 ++
>  include/linux/pci_ids.h  |  2 ++
>  2 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
> index 26ee48a..a718ea2 100644
> --- a/arch/x86/kernel/quirks.c
> +++ b/arch/x86/kernel/quirks.c
> @@ -5,6 +5,7 @@
>  #include 
>
>  #include 
> +#include "../../../drivers/iommu/irq_remapping.h"
>
>  #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI)
>
> @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 
> PCI_DEVICE_ID_AMD_15H_NB_F5,
> quirk_amd_nb_node);
>
>  #endif
> +
> +static void intel_remapping_check(struct pci_dev *dev)
> +{
> +   u8 revision;
> +
> +   pci_read_config_byte(dev, PCI_REVISION_ID, &revision);
> +
> +   if ((revision == 0x13) && irq_remapping_enabled) {
> +pr_warn(HW_ERR "This system BIOS has enabled interrupt 
> remapping\n"
> +"on a chipset that contains an errata making that\n"
> +"feature unstable.  Please reboot with nointremap\n"
> +"added to the kernel command line and contact\n"
> +"your BIOS vendor for an update");
> +   }
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5520_IOHUB, 
> intel_remapping_check);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5500_IOHUB, 
> intel_remapping_check);

This started as an IOMMU change, and I'm not an expert in that area,
so I added David and the IOMMU list.  I'd rather have him deal with
this than me.

Is this something we can just *fix* in the kernel, e.g., by turning
off interrupt remapping ourselves, or does it have to be done before
the OS boots?

> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 31717bd..54027a6 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2732,6 +2732,8 @@
>  #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2  0x2db2
>  #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV20x2db3
>  #define PCI_DEVICE_ID_INTEL_82855PM_HB 0x3340
> +#define PCI_DEVICE_ID_INTEL_5500_IOHUB 0x3403
> +#define PCI_DEVICE_ID_INTEL_5520_IOHUB 0x3406

For constants only used in one place, we just use the bare constant
(0x3403) in the quirk rather than editing pci_ids.h (see comment at
the top of that file).

>  #define PCI_DEVICE_ID_INTEL_IOAT_TBG4  0x3429
>  #define PCI_DEVICE_ID_INTEL_IOAT_TBG5  0x342a
>  #define PCI_DEVICE_ID_INTEL_IOAT_TBG6  0x342b
> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line

[PATCH] iommu/amd: Add workaround to propery clearing IOMMU status register

2013-04-03 Thread suravee.suthikulpanit
From: Suravee Suthikulpanit 

The IOMMU interrupt handling in bottom half must clear the PPR log interrupt
and event log interrupt bits to re-enable the interrupt. This is done by
writing 1 to the memory mapped register to clear the bit. Due to hardware bug,
if the driver tries to clear this bit while the IOMMU hardware also setting
this bit, the conflict will result with the bit being set. If the interrupt
handling code does not make sure to clear this bit, subsequent changes in the
event/PPR logs will no longer generating interrupts, and would result if
buffer overflow. After clearing the bits, the driver must read back
the register to verify.

In the current interrupt handling scheme, there are as many threads as
the number of IOMMUs. Each thread is created and assigned to an IOMMU at
the time of registering interrupt handlers (request_threaded_irq).
When an IOMMU HW generates an interrupt, the irq handler (top half) wakes up
the corresponding thread to process event and PPR logs of all IOMMUs
starting from the 1st IOMMU.

In the system with multiple IOMMU,this handling scheme complicates the
synchronization of the IOMMU data structures and status registers as
there could be multiple threads competing for the same IOMMU while
the other IOMMU could be left unhandled.

In order to simplify the implementation of the workaround, this patch is
proposing a different interrupt handling scheme by having each thread only
managing interrupts of the corresponding IOMMU. This can be achieved by passing
the struct amd_iommu when registering the interrupt handlers. This structure is
unique for each IOMMU and can be used by the bottom half thread to identify
the IOMMU to be handled instead of calling for_each_iommu.  Besides this also
eliminate the needs to lock the IOMMU for processing event and PPR logs.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c  |   59 
 drivers/iommu/amd_iommu_init.c |2 +-
 2 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e5db3e5..3548d63 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -803,12 +803,6 @@ retry:
 static void iommu_poll_events(struct amd_iommu *iommu)
 {
u32 head, tail;
-   unsigned long flags;
-
-   /* enable event interrupts again */
-   writel(MMIO_STATUS_EVT_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET);
-
-   spin_lock_irqsave(&iommu->lock, flags);
 
head = readl(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
tail = readl(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET);
@@ -819,8 +813,6 @@ static void iommu_poll_events(struct amd_iommu *iommu)
}
 
writel(head, iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
-
-   spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
 static void iommu_handle_ppr_entry(struct amd_iommu *iommu, u64 *raw)
@@ -845,17 +837,11 @@ static void iommu_handle_ppr_entry(struct amd_iommu 
*iommu, u64 *raw)
 
 static void iommu_poll_ppr_log(struct amd_iommu *iommu)
 {
-   unsigned long flags;
u32 head, tail;
 
if (iommu->ppr_log == NULL)
return;
 
-   /* enable ppr interrupts again */
-   writel(MMIO_STATUS_PPR_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET);
-
-   spin_lock_irqsave(&iommu->lock, flags);
-
head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
 
@@ -891,34 +877,49 @@ static void iommu_poll_ppr_log(struct amd_iommu *iommu)
head = (head + PPR_ENTRY_SIZE) % PPR_LOG_SIZE;
writel(head, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
 
-   /*
-* Release iommu->lock because ppr-handling might need to
-* re-acquire it
-*/
-   spin_unlock_irqrestore(&iommu->lock, flags);
-
/* Handle PPR entry */
iommu_handle_ppr_entry(iommu, entry);
 
-   spin_lock_irqsave(&iommu->lock, flags);
-
/* Refresh ring-buffer information */
head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
}
-
-   spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
 irqreturn_t amd_iommu_int_thread(int irq, void *data)
 {
-   struct amd_iommu *iommu;
+   struct amd_iommu *iommu = (struct amd_iommu *) data;
+   u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
 
-   for_each_iommu(iommu) {
-   iommu_poll_events(iommu);
-   iommu_poll_ppr_log(iommu);
-   }
+   while (status & (MMIO_STATUS_EVT_INT_MASK | MMIO_STATUS_PPR_INT_MASK)) {
+   if (status & MMIO_STATUS_EVT_INT_MASK) {
+   pr_devel("AMD-Vi: Processing IOMMU Event Log\n");
+   iommu_poll_events(iommu);
+   }
 
+   if (s

Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/02/2013 10:12:31 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 17:44 -0500, Scott Wood wrote:
> On 04/02/2013 04:32:04 PM, Alex Williamson wrote:
> > On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote:
> > > On 04/02/2013 03:32:17 PM, Alex Williamson wrote:
> > > > On x86 the interrupt remapper handles this transparently when  
MSI
> > > > is enabled and userspace never gets direct access to the  
device

> > MSI
> > > > address/data registers.
> > >
> > > x86 has a totally different mechanism here, as far as I  
understand

> > --
> > > even before you get into restrictions on mappings.
> >
> > So what control will userspace have over programming the actually  
MSI

> > vectors on PAMU?
>
> Not sure what you mean -- PAMU doesn't get explicitly involved in
> MSIs.  It's just another 4K page mapping (per relevant MSI bank).   
If

> you want isolation, you need to make sure that an MSI group is only
> used by one VFIO group, and that you're on a chip that has alias  
pages

> with just one MSI bank register each (newer chips do, but the first
> chip to have a PAMU didn't).

How does a user figure this out?


The user's involvement could be limited to setting a policy knob of  
whether that degree of isolation is required (if required and  
unavailable, all devices using an MSI bank would be forced into the  
same group).  We'd need to do something with MSI allocation so that we  
avoid using an MSI bank with more than one IOMMU group where possible.   
I'm not sure about the details yet, or how practical this is.  There  
might need to be some MSI bank assignment done as part of the VFIO  
device binding process, if there are going to be more VFIO groups than  
there are MSI banks (reserving one bank for host use).


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/02/2013 10:37:20 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 17:50 -0500, Scott Wood wrote:
> On 04/02/2013 04:38:45 PM, Alex Williamson wrote:
> > On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
> > > On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood
> >  wrote:
> > > >> >C.  Explicit mapping using normal DMA map.  The last  
idea

> > is that
> > > >> >we would introduce a new ioctl to give user-space  
an fd

> > to
> > > >> >the MSI bank, which could be mmapped.  The flow  
would be

> > > >> >something like this:
> > > >> >   -for each group user space calls new ioctl
> > > >> > VFIO_GROUP_GET_MSI_FD
> > > >> >   -user space mmaps the fd, getting a vaddr
> > > >> >   -user space does a normal DMA map for desired  
iova
> > > >> >This approach makes everything explicit, but adds a  
new

> > ioctl
> > > >> >applicable most likely only to the PAMU (type2  
iommu).

> > > >>
> > > >> And the DMA_MAP of that mmap then allows userspace to select  
the

> > window
> > > >> used?  This one seems like a lot of overhead, adding a new
> > ioctl, new
> > > >> fd, mmap, special mapping path, etc.
> > > >
> > > >
> > > > There's going to be special stuff no matter what.  This would
> > keep it
> > > > separated from the IOMMU map code.
> > > >
> > > > I'm not sure what you mean by "overhead" here... the runtime
> > overhead of
> > > > setting things up is not particularly relevant as long as it's
> > reasonable.
> > > > If you mean development and maintenance effort, keeping things
> > well
> > > > separated should help.
> > >
> > > We don't need to change DMA_MAP.  If we can simply add a new  
"type

> > 2"
> > > ioctl that allows user space to set which windows are MSIs, it
> > seems vastly
> > > less complex than an ioctl to supply a new fd, mmap of it, etc.
> > >
> > > So maybe 2 ioctls:
> > > VFIO_IOMMU_GET_MSI_COUNT
>
> Do you mean a count of actual MSIs or a count of MSI banks used by  
the

> whole VFIO group?

I hope the latter, which would clarify how this is distinct from
DEVICE_GET_IRQ_INFO.  Is hotplug even on the table?  Presumably
dynamically adding a device could bring along additional MSI banks?


I'm not sure -- maybe we could say that hotplug can add banks, but not  
remove them or change the order, so userspace would just need to check  
if the number of banks changed, and map the extras.


The current VFIO MSI support has the host handling everything about  
MSI.

The user never programs an MSI vector to the physical device, they set
up everything through ioctl.  On interrupt, we simply trigger an  
eventfd
and leave it to things like KVM irqfd or QEMU to do the right thing  
in a

virtual machine.

Here the MSI vector has to go through a PAMU window to hit the correct
MSI bank.  So that means it has some component of the iova involved,
which we're proposing here is controlled by userspace (whether that
vector uses an offset from 0x1000 or 0x depending on which
window slot is used to make the MSI bank).  I assume we're still  
working

in a model where the physical interrupt fires into the host and a
host-based interrupt handler triggers an eventfd, right?


Yes (subject to possible future optimizations).

So that means the vector also has host components so we trigger the  
correct ISR.  How

is that coordinated?


Everything but the iova component needs to come from the host MSI  
allocator.



Would is be possible for userspace to simply leave room for MSI bank
mapping (how much room could be determined by something like
VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace  
can

DMA_MAP starting at the 0x0 address of the aperture, growing up, and
VFIO will map banks on demand at the top of the aperture, growing  
down?

Wouldn't that avoid a lot of issues with userspace needing to know
anything about MSI banks (other than count) and coordinating irq  
numbers

and enabling handlers?


This would restrict a (possibly unlikely) use case where the user wants  
to map something near the top of the aperture but has another place  
MSIs can go (or is willing to live without MSIs).  Otherwise it could  
be workable, as long as we can require an explicit MSI enabling on a  
device to happen after the aperture and subwindow count are set up.   
I'm not sure it would really buy anything over having userspace iterate  
over the MSI bank count, though -- it would probably be a bit more  
complicated.



> > On x86 MSI count is very
> > device specific, which means it wold be a VFIO_DEVICE_* ioctl
> > (actually
> > VFIO_DEVICE_GET_IRQ_INFO does this for us on x86).  The trouble  
with

> > it
> > being a device ioctl is that you need to get the device FD, but  
the
> > IOMMU protection needs to be established before you can get  
that... so

> > there's an ordering problem if you need it from the device before
> > configuring the IOMMU.  Thanks,
>
> What do you mean by "IOMMU protection needs

Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/03/2013 02:43:06 PM, Stuart Yoder wrote:
On Wed, Apr 3, 2013 at 2:18 PM, Scott Wood   
wrote:

> On 04/03/2013 02:09:45 PM, Stuart Yoder wrote:
>>
>> > Would is be possible for userspace to simply leave room for MSI  
bank

>> > mapping (how much room could be determined by something like
>> > VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that  
userspace can
>> > DMA_MAP starting at the 0x0 address of the aperture, growing up,  
and
>> > VFIO will map banks on demand at the top of the aperture,  
growing down?
>> > Wouldn't that avoid a lot of issues with userspace needing to  
know
>> > anything about MSI banks (other than count) and coordinating irq  
numbers

>> > and enabling handlers?
>>
>> This is basically option #A in the original proposals sent.   I  
like

>> this approach, in that it
>> is simpler and keeps user space mostly out of this...which is
>> consistent with how things are done
>> on x86.  User space just needs to know how many windows to leave at
>> the top of the aperture.
>> The kernel then has the flexibility to use those windows how it  
wants.

>>
>> But one question, is when should the kernel actually map (and  
unmap)

>> the MSI banks.
>
>
> I think userspace should explicitly request it.  Userspace still  
wouldn't

> need to know anything but the count:
>
> count = VFIO_IOMMU_GET_MSI_BANK_COUNT
> VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
> VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
> // do other DMA maps now, or later, or not at all, doesn't matter
> for (i = 0; i < count; i++)
> VFIO_IOMMU_MAP_MSI_BANK(iova, i);
> // The kernel now knows where each bank has been mapped, and can  
update PCI

> config space appropriately.

And the overall aperture enable/disable would occur on the first
dma/msi map() and last dma/msi unmap()?


Yes.  We may want the optional ability to do an overall enable/disable  
for reasons we discussed a while ago, but in the absence of an explicit  
disable the domain would be enabled on first map.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/03/2013 02:09:45 PM, Stuart Yoder wrote:

> Would is be possible for userspace to simply leave room for MSI bank
> mapping (how much room could be determined by something like
> VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace  
can

> DMA_MAP starting at the 0x0 address of the aperture, growing up, and
> VFIO will map banks on demand at the top of the aperture, growing  
down?

> Wouldn't that avoid a lot of issues with userspace needing to know
> anything about MSI banks (other than count) and coordinating irq  
numbers

> and enabling handlers?

This is basically option #A in the original proposals sent.   I like
this approach, in that it
is simpler and keeps user space mostly out of this...which is
consistent with how things are done
on x86.  User space just needs to know how many windows to leave at
the top of the aperture.
The kernel then has the flexibility to use those windows how it wants.

But one question, is when should the kernel actually map (and unmap)
the MSI banks.   One thing we need to do is enable the aperture...and  
current
thinking is that is done on the first DMA_MAP.   Similarly when the  
last mapping

is unmapped we could also umap the MSI banks.

Sequence would be something like:

VFIO_GROUP_SET_CONTAINER // add groups to the container

VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model

cnt = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of  
MSI banks


VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture

VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows,
including MSI banks

VFIO_IOMMU_MAP_DMA// map the guest's memory
   ---> kernel enables aperture and maps needed MSI banks  
here


Maps them where?

What if there is more than one explicit DMA mapping?  What if DMA  
mappings are changed during operation?


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
On Wed, Apr 3, 2013 at 2:18 PM, Scott Wood  wrote:
> On 04/03/2013 02:09:45 PM, Stuart Yoder wrote:
>>
>> > Would is be possible for userspace to simply leave room for MSI bank
>> > mapping (how much room could be determined by something like
>> > VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can
>> > DMA_MAP starting at the 0x0 address of the aperture, growing up, and
>> > VFIO will map banks on demand at the top of the aperture, growing down?
>> > Wouldn't that avoid a lot of issues with userspace needing to know
>> > anything about MSI banks (other than count) and coordinating irq numbers
>> > and enabling handlers?
>>
>> This is basically option #A in the original proposals sent.   I like
>> this approach, in that it
>> is simpler and keeps user space mostly out of this...which is
>> consistent with how things are done
>> on x86.  User space just needs to know how many windows to leave at
>> the top of the aperture.
>> The kernel then has the flexibility to use those windows how it wants.
>>
>> But one question, is when should the kernel actually map (and unmap)
>> the MSI banks.
>
>
> I think userspace should explicitly request it.  Userspace still wouldn't
> need to know anything but the count:
>
> count = VFIO_IOMMU_GET_MSI_BANK_COUNT
> VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
> VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
> // do other DMA maps now, or later, or not at all, doesn't matter
> for (i = 0; i < count; i++)
> VFIO_IOMMU_MAP_MSI_BANK(iova, i);
> // The kernel now knows where each bank has been mapped, and can update PCI
> config space appropriately.

And the overall aperture enable/disable would occur on the first
dma/msi map() and last dma/msi unmap()?

Stuart
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Alex Williamson
On Wed, 2013-04-03 at 14:09 -0500, Stuart Yoder wrote:
> > Would is be possible for userspace to simply leave room for MSI bank
> > mapping (how much room could be determined by something like
> > VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can
> > DMA_MAP starting at the 0x0 address of the aperture, growing up, and
> > VFIO will map banks on demand at the top of the aperture, growing down?
> > Wouldn't that avoid a lot of issues with userspace needing to know
> > anything about MSI banks (other than count) and coordinating irq numbers
> > and enabling handlers?
> 
> This is basically option #A in the original proposals sent.   I like
> this approach, in that it
> is simpler and keeps user space mostly out of this...which is
> consistent with how things are done
> on x86.  User space just needs to know how many windows to leave at
> the top of the aperture.
> The kernel then has the flexibility to use those windows how it wants.
> 
> But one question, is when should the kernel actually map (and unmap)
> the MSI banks.   One thing we need to do is enable the aperture...and current
> thinking is that is done on the first DMA_MAP.   Similarly when the last 
> mapping
> is unmapped we could also umap the MSI banks.
> 
> Sequence would be something like:
> 
> VFIO_GROUP_SET_CONTAINER // add groups to the container
> 
> VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model
> 
> cnt = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of MSI banks
> 
> VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture
> 
> VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows,
> including MSI banks
> 
> VFIO_IOMMU_MAP_DMA// map the guest's memory
>---> kernel enables aperture and maps needed MSI banks here
> 
> VFIO_DEVICE_SET_IRQS
>---> kernel sets actual MSI addr/data in physical
> device here (I think)

You could also make use of the IOMMU_ENABLE/DISABLE entry points that
Alexey plans to use.  Ideally I'd think that you'd want to enable the
required MSI banks for a device on DEVICE_SET_IRQs.  That's effectively
what happens on x86.  Perhaps some information stored in the domain
structure would let architecture hooks in MSI setup enable those
mappings for you?  Thanks,

Alex


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/03/2013 02:09:45 PM, Stuart Yoder wrote:

> Would is be possible for userspace to simply leave room for MSI bank
> mapping (how much room could be determined by something like
> VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace  
can

> DMA_MAP starting at the 0x0 address of the aperture, growing up, and
> VFIO will map banks on demand at the top of the aperture, growing  
down?

> Wouldn't that avoid a lot of issues with userspace needing to know
> anything about MSI banks (other than count) and coordinating irq  
numbers

> and enabling handlers?

This is basically option #A in the original proposals sent.   I like
this approach, in that it
is simpler and keeps user space mostly out of this...which is
consistent with how things are done
on x86.  User space just needs to know how many windows to leave at
the top of the aperture.
The kernel then has the flexibility to use those windows how it wants.

But one question, is when should the kernel actually map (and unmap)
the MSI banks.


I think userspace should explicitly request it.  Userspace still  
wouldn't need to know anything but the count:


count = VFIO_IOMMU_GET_MSI_BANK_COUNT
VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
// do other DMA maps now, or later, or not at all, doesn't matter
for (i = 0; i < count; i++)
VFIO_IOMMU_MAP_MSI_BANK(iova, i);
// The kernel now knows where each bank has been mapped, and can update  
PCI config space appropriately.



One thing we need to do is enable the aperture...and current
thinking is that is done on the first DMA_MAP.


What if there are no other mappings required?

-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
> Would is be possible for userspace to simply leave room for MSI bank
> mapping (how much room could be determined by something like
> VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can
> DMA_MAP starting at the 0x0 address of the aperture, growing up, and
> VFIO will map banks on demand at the top of the aperture, growing down?
> Wouldn't that avoid a lot of issues with userspace needing to know
> anything about MSI banks (other than count) and coordinating irq numbers
> and enabling handlers?

This is basically option #A in the original proposals sent.   I like
this approach, in that it
is simpler and keeps user space mostly out of this...which is
consistent with how things are done
on x86.  User space just needs to know how many windows to leave at
the top of the aperture.
The kernel then has the flexibility to use those windows how it wants.

But one question, is when should the kernel actually map (and unmap)
the MSI banks.   One thing we need to do is enable the aperture...and current
thinking is that is done on the first DMA_MAP.   Similarly when the last mapping
is unmapped we could also umap the MSI banks.

Sequence would be something like:

VFIO_GROUP_SET_CONTAINER // add groups to the container

VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model

cnt = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of MSI banks

VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture

VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows,
including MSI banks

VFIO_IOMMU_MAP_DMA// map the guest's memory
   ---> kernel enables aperture and maps needed MSI banks here

VFIO_DEVICE_SET_IRQS
   ---> kernel sets actual MSI addr/data in physical
device here (I think)


Stuart
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/03/2013 01:32:26 PM, Stuart Yoder wrote:
On Tue, Apr 2, 2013 at 5:50 PM, Scott Wood   
wrote:

> On 04/02/2013 04:38:45 PM, Alex Williamson wrote:
>>
>> On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
>> > VFIO_IOMMU_MAP_MSI(iova, size)
>
>
> Not sure how you mean "size" to be used -- for MPIC it would be 4K  
per bank,
> and you can only map one bank at a time (which bank you're mapping  
should be
> a parameter, if only so that the kernel doesn't have to keep  
iteration state

> for you).

The intent was for user space to tell the kernel which windows to use
for MSI.   So I envisioned a total size of window-size *  
msi-bank-count.


Size doesn't tell the kernel *which* banks to use, only how many.  If  
it already knows which banks are used by the group, then it also knows  
how many are used.  And size is misleading because the mapping is not  
generally going to be contiguous.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 5:50 PM, Scott Wood  wrote:
> On 04/02/2013 04:38:45 PM, Alex Williamson wrote:
>>
>> On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
>> > On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood 
>> > wrote:
>> > >> >C.  Explicit mapping using normal DMA map.  The last idea is
>> > >> > that
>> > >> >we would introduce a new ioctl to give user-space an fd to
>> > >> >the MSI bank, which could be mmapped.  The flow would be
>> > >> >something like this:
>> > >> >   -for each group user space calls new ioctl
>> > >> > VFIO_GROUP_GET_MSI_FD
>> > >> >   -user space mmaps the fd, getting a vaddr
>> > >> >   -user space does a normal DMA map for desired iova
>> > >> >This approach makes everything explicit, but adds a new
>> > >> > ioctl
>> > >> >applicable most likely only to the PAMU (type2 iommu).
>> > >>
>> > >> And the DMA_MAP of that mmap then allows userspace to select the
>> > >> window
>> > >> used?  This one seems like a lot of overhead, adding a new ioctl, new
>> > >> fd, mmap, special mapping path, etc.
>> > >
>> > >
>> > > There's going to be special stuff no matter what.  This would keep it
>> > > separated from the IOMMU map code.
>> > >
>> > > I'm not sure what you mean by "overhead" here... the runtime overhead
>> > > of
>> > > setting things up is not particularly relevant as long as it's
>> > > reasonable.
>> > > If you mean development and maintenance effort, keeping things well
>> > > separated should help.
>> >
>> > We don't need to change DMA_MAP.  If we can simply add a new "type 2"
>> > ioctl that allows user space to set which windows are MSIs, it seems
>> > vastly
>> > less complex than an ioctl to supply a new fd, mmap of it, etc.
>> >
>> > So maybe 2 ioctls:
>> > VFIO_IOMMU_GET_MSI_COUNT
>
>
> Do you mean a count of actual MSIs or a count of MSI banks used by the whole
> VFIO group?

I meant # of MSI banks, so VFIO_IOMMU_GET_MSI_BANK_COUNT would be better.

>> > VFIO_IOMMU_MAP_MSI(iova, size)
>
>
> Not sure how you mean "size" to be used -- for MPIC it would be 4K per bank,
> and you can only map one bank at a time (which bank you're mapping should be
> a parameter, if only so that the kernel doesn't have to keep iteration state
> for you).

The intent was for user space to tell the kernel which windows to use
for MSI.   So I envisioned a total size of window-size * msi-bank-count.

Stuart
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
>> >  Type1 is arbitrary.  It might as well be named "brown" and this one
>> > can be
>> > "blue".
>>
>> The difference is that "type1" seems to refer to hardware that can do
>> arbitrary 4K page mappings, possibly constrained by an aperture but
>> nothing else.  More than one IOMMU can reasonably fit that.  The odds
>> that another IOMMU would have exactly the same restrictions as PAMU
>> seem smaller in comparison.
>>
>> In any case, if you had to deal with some Intel-only quirk, would it
>> make sense to call it a "type1 attribute"?  I'm not advocating one way
>> or the other on whether an abstraction is viable here (though Stuart
>> seems to think it's "highly unlikely anything but a PAMU will comply"),
>> just that if it is to be abstracted rather than a hardware-specific
>> interface, we need to document what is and is not part of the
>> abstraction.  Otherwise a non-PAMU-specific user won't know what they
>> can rely on, and someone adding support for a new windowed IOMMU won't
>> know if theirs is close enough, or they need to introduce a "type3".
>
> So Alexey named the SPAPR IOMMU something related to spapr...
> surprisingly enough.  I'm fine with that.  If you think it's unique
> enough, name it something appropriately.  I haven't seen the code and
> don't know the architecture sufficiently to have an opinion.

The only reason I suggested "type 2" is that I thought that was the
convention...we would enumerate different iommus.   I think that
calling it "pamu" is better and more clear.

Stuart
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-03 Thread Alex Williamson
On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote:
> Cc'ing Alex Williamson
> 
> Alex, can you please review the iommu-group part of this patch?

Sure, it looks pretty reasonable.  AIUI, all PCI devices are below some
kind of host bridge that is either new and supports partitioning or old
and doesn't.  I don't know if that's a visibility or isolation
requirement, perhaps PCI ACS-ish.  In the new host bridge case, each
device gets a group.  This seems not to have any quirks for
multifunction devices though.  On AMD and Intel IOMMUs we test
multifunction device ACS support to determine whether all the functions
should be in the same group.  Is there any reason to trust multifunction
devices on PAMU?

I also find it curious what happens to the iommu group of the host
bridge.  In the partitionable case the host bridge group is removed, in
the non-partitionable case the host bridge group becomes the group for
the children, removing the host bridge.  It's unique to PAMU so far that
these host bridges are even in an iommu group (x86 only adds pci
devices), but I don't see it as necessarily wrong leaving it in either
scenario.  Does it solve some problem to remove them from the groups?
Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-03 Thread Yoder Stuart-B08248


> -Original Message-
> From: Sethi Varun-B16395
> Sent: Wednesday, April 03, 2013 12:12 AM
> To: Wood Scott-B07421; Timur Tabi
> Cc: Joerg Roedel; lkml; Kumar Gala; Yoder Stuart-B08248; 
> iommu@lists.linux-foundation.org; Benjamin
> Herrenschmidt; linuxppc-...@lists.ozlabs.org
> Subject: RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu 
> implementation.
> 
> 
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Wednesday, April 03, 2013 7:23 AM
> > To: Timur Tabi
> > Cc: Joerg Roedel; Sethi Varun-B16395; lkml; Kumar Gala; Yoder Stuart-
> > B08248; iommu@lists.linux-foundation.org; Benjamin Herrenschmidt;
> > linuxppc-...@lists.ozlabs.org
> > Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu
> > implementation.
> >
> > On 04/02/2013 08:35:54 PM, Timur Tabi wrote:
> > > On Tue, Apr 2, 2013 at 11:18 AM, Joerg Roedel  wrote:
> > >
> > > > > + panic("\n");
> > > >
> > > > A kernel panic seems like an over-reaction to an access violation.
> > >
> > > We have no way to determining what code caused the violation, so we
> > > can't just kill the process.  I agree it seems like overkill, but what
> > > else should we do?  Does the IOMMU layer have a way for the IOMMU
> > > driver to stop the device that caused the problem?
> >
> > At a minimum, log a message and continue.  Probably turn off the LIODN,
> > at least if it continues to be noisy (otherwise we could get stuck in an
> > interrupt storm as you note).  Possibly let the user know somehow,
> > especially if it's a VFIO domain.
> [Sethi Varun-B16395] Can definitely log the message and disable the LIODN (to 
> avoid an interrupt storm),
> but
> we definitely need a mechanism to inform vfio subsystem about the error. 
> Also, disabling LIODN may not
> be a viable
> option with the new LIODN allocation scheme (where LIODN would be associated 
> with a domain).

I think for phase 1 of this, just log the error, shut down DMA as you described.
We can implement more full featured error management, like notifying vfio
or the VM somehow in the future.

Stuart


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

2013-04-03 Thread David Woodhouse
On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
> (2013/04/02 23:05), Joerg Roedel wrote:
> > On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
> >> 
> >> enable_IR
> >>intel_enable_irq_remapping
> >>  iommu_disable_irq_remapping  <== IRES/QIES/TES disabled here
> >>  dmar_disable_qi  <== do nothing
> >>  dmar_enable_qi   <== QIES enabled
> >>  intel_setup_irq_remapping<== IRES enabled
> > 
> > But what we want to do here in the kdumo case is to disable translation
> > too, right? Because the former kernel might have translation and
> > irq-remapping enabled and the kdump kernel might be compiled without
> > support for dma-remapping. So if we don't disable translation here too
> > the kdump kernel is unable to do DMA.
>
> Yeah, you are right. I forgot such a case.

If you disable translation and there's some device still doing DMA, it's
going to scribble over random areas of memory. You really want to have
translation enabled and all the page tables *cleared*, during kexec. I
think it's fair to insist that the secondary kernel should use the IOMMU
if the first one did.

> To be honest, I also expected the side effect of this patch. As I wrote
> in the previous mail, I'm working on kdump problem with iommu, that is,
> ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails
> due to this fault.

Here you've lost me. The DMAR fault is caught and reported, and how does
this lead to a kdump failure? Are you using dodgy hardware that just
keeps *trying* after an abort, and floods the system with a storm of
DMAR faults? We've occasionally spoken about working around such a
problem by setting a bit to make subsequent faults *silent*. Would that
work?

>  What we have to do is stopping DMA transaction
> before DMA-remapping is disabled in 2nd kernel.

The IOMMU is there to stop DMA transactions. That is its *job*. :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 4/5 v11] iommu/fsl: Add additional iommu attributes required by the PAMU driver.

2013-04-03 Thread Joerg Roedel
On Wed, Apr 03, 2013 at 05:21:16AM +, Sethi Varun-B16395 wrote:
> > I would prefer these PAMU specific enum and struct to be in a pamu-
> > specific iommu-header.
> > 
> 
> [Sethi Varun-B16395] But, these would be used by the IOMMU API users
> (e.g. VFIO), they shouldn't depend on PAMU specific headers.

Drivers that use PAMU specifics (like the PAMU specific VFIO parts) can
also include the PAMU specific header.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

2013-04-03 Thread Takao Indoh
(2013/04/02 23:05), Joerg Roedel wrote:
> On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
>> 
>> enable_IR
>>intel_enable_irq_remapping
>>  iommu_disable_irq_remapping  <== IRES/QIES/TES disabled here
>>  dmar_disable_qi  <== do nothing
>>  dmar_enable_qi   <== QIES enabled
>>  intel_setup_irq_remapping<== IRES enabled
> 
> But what we want to do here in the kdumo case is to disable translation
> too, right? Because the former kernel might have translation and
> irq-remapping enabled and the kdump kernel might be compiled without
> support for dma-remapping. So if we don't disable translation here too
> the kdump kernel is unable to do DMA.

Yeah, you are right. I forgot such a case.

To be honest, I also expected the side effect of this patch. As I wrote
in the previous mail, I'm working on kdump problem with iommu, that is,
ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails
due to this fault. What we have to do is stopping DMA transaction
before DMA-remapping is disabled in 2nd kernel. To do this, we need to
reset devices before intel_enable_irq_remapping() is called, but it is
very difficult because struct pci_dev is not prepared in this stage.
After applying this patch, DMA-remapping is disabled in init_dmars where
struct pci_dev is ready, so it's easier to handle. For example we can
add pci quirk to reset devices.

So, disabling translation in 2nd kernel is necessary, and it's better if
we can do it after struct pci_dev is prepared. (after subsys_initcall?)
Any idea?

Thanks,
Takao Indoh

> 
> I agree that disabling translation should be a bit more explicit instead
> of the current code.
> 
> 
>   Joerg
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-03 Thread Sethi Varun-B16395


> -Original Message-
> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Tuesday, April 02, 2013 9:48 PM
> To: Sethi Varun-B16395
> Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-
> foundation.org; linuxppc-...@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; ga...@kernel.crashing.org;
> b...@kernel.crashing.org; Alex Williamson
> Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu
> implementation.
> 
> Cc'ing Alex Williamson
> 
> Alex, can you please review the iommu-group part of this patch?
> 
> My comments so far are below:
> 
> On Fri, Mar 29, 2013 at 01:24:02AM +0530, Varun Sethi wrote:
> > +config FSL_PAMU
> > +   bool "Freescale IOMMU support"
> > +   depends on PPC_E500MC
> > +   select IOMMU_API
> > +   select GENERIC_ALLOCATOR
> > +   help
> > + Freescale PAMU support.
> 
> A bit lame for a help text. Can you elaborate more what PAMU is and when
> it should be enabled?
> 
[Sethi Varun-B16395] Will update the description.

> > +int pamu_enable_liodn(int liodn)
> > +{
> > +   struct paace *ppaace;
> > +
> > +   ppaace = pamu_get_ppaace(liodn);
> > +   if (!ppaace) {
> > +   pr_err("Invalid primary paace entry\n");
> > +   return -ENOENT;
> > +   }
> > +
> > +   if (!get_bf(ppaace->addr_bitfields, PPAACE_AF_WSE)) {
> > +   pr_err("liodn %d not configured\n", liodn);
> > +   return -EINVAL;
> > +   }
> > +
> > +   /* Ensure that all other stores to the ppaace complete first */
> > +   mb();
> > +
> > +   ppaace->addr_bitfields |= PAACE_V_VALID;
> > +   mb();
> 
> Why is it sufficient to set the bit in a variable when enabling liodn but
> when disabling it set_bf needs to be called? This looks a bit assymetric.
> 
[Sethi Varun-B16395] Will make it symetric :)

> > +/* Derive the window size encoding for a particular PAACE entry */
> > +static unsigned int map_addrspace_size_to_wse(phys_addr_t
> > +addrspace_size) {
> > +   /* Bug if not a power of 2 */
> > +   BUG_ON((addrspace_size & (addrspace_size - 1)));
> 
> Please use is_power_of_2 here.
> 
> > +
> > +   /* window size is 2^(WSE+1) bytes */
> > +   return __ffs(addrspace_size >> PAMU_PAGE_SHIFT) + PAMU_PAGE_SHIFT -
> > +1;
> 
> The PAMU_PAGE_SHIFT shifting and adding looks redundant.
> 
> > +   if ((win_size & (win_size - 1)) || win_size < PAMU_PAGE_SIZE) {
> > +   pr_err("window size too small or not a power of two %llx\n",
> win_size);
> > +   return -EINVAL;
> > +   }
> > +
> > +   if (win_addr & (win_size - 1)) {
> > +   pr_err("window address is not aligned with window size\n");
> > +   return -EINVAL;
> > +   }
> 
> Again, use is_power_of_2 instead of hand-coding.
>
[Sethi Varun-B16395] ok
 
> > +   if (~stashid != 0)
> > +   set_bf(paace->impl_attr, PAACE_IA_CID, stashid);
> > +
> > +   smp_wmb();
> > +
> > +   if (enable)
> > +   paace->addr_bitfields |= PAACE_V_VALID;
> 
> Havn't you written a helper funtion to set this bit?
> 
[Sethi Varun-B16395] We already have a PAACE entry with us here so we can 
directly manipulate it here.

> > +irqreturn_t pamu_av_isr(int irq, void *arg) {
> > +   struct pamu_isr_data *data = arg;
> > +   phys_addr_t phys;
> > +   unsigned int i, j;
> > +
> > +   pr_emerg("fsl-pamu: access violation interrupt\n");
> > +
> > +   for (i = 0; i < data->count; i++) {
> > +   void __iomem *p = data->pamu_reg_base + i * PAMU_OFFSET;
> > +   u32 pics = in_be32(p + PAMU_PICS);
> > +
> > +   if (pics & PAMU_ACCESS_VIOLATION_STAT) {
> > +   pr_emerg("POES1=%08x\n", in_be32(p + PAMU_POES1));
> > +   pr_emerg("POES2=%08x\n", in_be32(p + PAMU_POES2));
> > +   pr_emerg("AVS1=%08x\n", in_be32(p + PAMU_AVS1));
> > +   pr_emerg("AVS2=%08x\n", in_be32(p + PAMU_AVS2));
> > +   pr_emerg("AVA=%016llx\n", make64(in_be32(p +
> PAMU_AVAH),
> > +   in_be32(p + PAMU_AVAL)));
> > +   pr_emerg("UDAD=%08x\n", in_be32(p + PAMU_UDAD));
> > +   pr_emerg("POEA=%016llx\n", make64(in_be32(p +
> PAMU_POEAH),
> > +   in_be32(p + PAMU_POEAL)));
> > +
> > +   phys = make64(in_be32(p + PAMU_POEAH),
> > +   in_be32(p + PAMU_POEAL));
> > +
> > +   /* Assume that POEA points to a PAACE */
> > +   if (phys) {
> > +   u32 *paace = phys_to_virt(phys);
> > +
> > +   /* Only the first four words are relevant */
> > +   for (j = 0; j < 4; j++)
> > +   pr_emerg("PAACE[%u]=%08x\n", j,
> in_be32(paace + j));
> > +   }
> > +   }
> > +   }
> > +
> > +   panic("\n");
> 
> A kernel panic seems like an over-reaction to an access violation.
> Besides the device that caused the violation the system should still
> work, no?
> 
[Sethi Varun-B16395] Well, if devic