Re: [RESEND PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue

2020-12-02 Thread Dmitry Safonov
On Tue, 1 Dec 2020 at 21:50, Will Deacon  wrote:
>
> On Tue, 17 Nov 2020 18:25:30 +0800, John Garry wrote:
> > This series contains a patch to solve the longterm IOVA issue which
> > leizhen originally tried to address at [0].
> >
> > A sieved kernel log is at the following, showing periodic dumps of IOVA
> > sizes, per CPU and per depot bin, per IOVA size granule:
> > https://raw.githubusercontent.com/hisilicon/kernel-dev/topic-iommu-5.10-iova-debug-v3/aging_test
> >
> > [...]
>
> Applied the final patch to arm64 (for-next/iommu/iova), thanks!
>
> [4/4] iommu: avoid taking iova_rbtree_lock twice
>   https://git.kernel.org/arm64/c/3a651b3a27a1

Glad it made in next, 2 years ago I couldn't convince iommu maintainer
it's worth it (but with a different justification):
https://lore.kernel.org/linux-iommu/20180621180823.805-3-d...@arista.com/

Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Ensure pci transactions coming from PLX NTB are handled when IOMMU is turned on

2019-11-20 Thread Dmitry Safonov
+Cc: linux-...@vger.kernel.org
+Cc: Bjorn Helgaas 
+Cc: Logan Gunthorpe 

On 11/5/19 12:17 PM, James Sewart wrote:
> Any comments on this?
> 
> Cheers,
> James.
> 
>> On 24 Oct 2019, at 13:52, James Sewart  wrote:
>>
>> The PLX PEX NTB forwards DMA transactions using Requester ID's that don't 
>> exist as
>> PCI devices. The devfn for a transaction is used as an index into a lookup 
>> table
>> storing the origin of a transaction on the other side of the bridge.
>>
>> This patch aliases all possible devfn's to the NTB device so that any 
>> transaction
>> coming in is governed by the mappings for the NTB.
>>
>> Signed-Off-By: James Sewart 
>> ---
>> drivers/pci/quirks.c | 22 ++
>> 1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 320255e5e8f8..647f546e427f 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -5315,6 +5315,28 @@ SWITCHTEC_QUIRK(0x8574);  /* PFXI 64XG3 */
>> SWITCHTEC_QUIRK(0x8575);  /* PFXI 80XG3 */
>> SWITCHTEC_QUIRK(0x8576);  /* PFXI 96XG3 */
>>
>> +/*
>> + * PLX NTB uses devfn proxy IDs to move TLPs between NT endpoints. These IDs
>> + * are used to forward responses to the originator on the other side of the
>> + * NTB. Alias all possible IDs to the NTB to permit access when the IOMMU is
>> + * turned on.
>> + */
>> +static void quirk_PLX_NTB_dma_alias(struct pci_dev *pdev)
>> +{
>> +if (!pdev->dma_alias_mask)
>> +pdev->dma_alias_mask = kcalloc(BITS_TO_LONGS(U8_MAX),
>> +  sizeof(long), GFP_KERNEL);
>> +if (!pdev->dma_alias_mask) {
>> +dev_warn(>dev, "Unable to allocate DMA alias mask\n");
>> +return;
>> +}
>> +
>> +// PLX NTB may use all 256 devfns
>> +memset(pdev->dma_alias_mask, U8_MAX, (U8_MAX+1)/BITS_PER_BYTE);
>> +}
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b0, 
>> quirk_PLX_NTB_dma_alias);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b1, 
>> quirk_PLX_NTB_dma_alias);
>> +
>> /*
>>  * On Lenovo Thinkpad P50 SKUs with a Nvidia Quadro M1000M, the BIOS does
>>  * not always reset the secondary Nvidia GPU between reboots if the system
>> -- 
>> 2.19.1
>>
>>
> 

Thanks,
  Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4.19 17/32] iommu/vt-d: Dont queue_iova() if there is no flush queue

2019-08-06 Thread Dmitry Safonov
On 8/6/19 11:47 PM, Dmitry Safonov wrote:
> Hi Pavel,
> 
> On 8/3/19 10:34 PM, Pavel Machek wrote:
>> Hi!
>>
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -3721,7 +3721,7 @@ static void intel_unmap(struct device *d
>>>  
>>> freelist = domain_unmap(domain, start_pfn, last_pfn);
>>>  
>>> -   if (intel_iommu_strict) {
>>> +   if (intel_iommu_strict || !has_iova_flush_queue(>iovad)) {
>>> iommu_flush_iotlb_psi(iommu, domain, start_pfn,
>>>   nrpages, !freelist, 0);
>>> /* free iova */
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -65,9 +65,14 @@ init_iova_domain(struct iova_domain *iov
>>>  }
>>>  EXPORT_SYMBOL_GPL(init_iova_domain);
>>>  
>>> +bool has_iova_flush_queue(struct iova_domain *iovad)
>>> +{
>>> +   return !!iovad->fq;
>>
>> Should this be READ_ONCE()?
> 
> Why? Compiler can't anyhow assume that it's always true/false and there
> is a clear data dependency between this and:
> : queue_iova(>iovad, iova_pfn, nrpages,
> :(unsigned long)freelist);
> 
>>
>>> @@ -100,13 +106,17 @@ int init_iova_flush_queue(struct iova_do
>>> for_each_possible_cpu(cpu) {
>>> struct iova_fq *fq;
>>>  
>>> -   fq = per_cpu_ptr(iovad->fq, cpu);
>>> +   fq = per_cpu_ptr(queue, cpu);
>>> fq->head = 0;
>>> fq->tail = 0;
>>>  
>>> spin_lock_init(>lock);
>>> }
>>>  
>>> +   smp_wmb();
>>> +
>>> +   iovad->fq = queue;
>>> +
>>
>> Could we have a comment why the barrier is needed,
> 
> I'm up for the comment if you feel like it - in my POV it's quite
> obvious that we want finish initializing the queue's internals before
> assigning the queue. I didn't put the comment exactly because I felt
> like it would state something evident [in my POV].
> 
>> and perhaps there
>> should be oposing smp_rmb() somewhere? Does this need to be
>> WRITE_ONCE() as it is racing against reader?
> 
> I feel confused. I might have forgotten everything about barriers, but
> again if I'm not mistaken, one doesn't need a barrier in:
> : if (A->a != NULL)
> : use(A); /* dereferences A->a */
> : else
> : /* don't use `a' */

And in this simplified example I mean that use() will either see A->a
initialized (IOW, CPU can't load A->a->field1 before checking the
condition) or use() will not be called.

Thanks,
  Dmitry


Re: [PATCH 4.19 17/32] iommu/vt-d: Dont queue_iova() if there is no flush queue

2019-08-06 Thread Dmitry Safonov via iommu
Hi Pavel,

On 8/3/19 10:34 PM, Pavel Machek wrote:
> Hi!
> 
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -3721,7 +3721,7 @@ static void intel_unmap(struct device *d
>>  
>>  freelist = domain_unmap(domain, start_pfn, last_pfn);
>>  
>> -if (intel_iommu_strict) {
>> +if (intel_iommu_strict || !has_iova_flush_queue(>iovad)) {
>>  iommu_flush_iotlb_psi(iommu, domain, start_pfn,
>>nrpages, !freelist, 0);
>>  /* free iova */
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -65,9 +65,14 @@ init_iova_domain(struct iova_domain *iov
>>  }
>>  EXPORT_SYMBOL_GPL(init_iova_domain);
>>  
>> +bool has_iova_flush_queue(struct iova_domain *iovad)
>> +{
>> +return !!iovad->fq;
> 
> Should this be READ_ONCE()?

Why? Compiler can't anyhow assume that it's always true/false and there
is a clear data dependency between this and:
:   queue_iova(>iovad, iova_pfn, nrpages,
:  (unsigned long)freelist);

> 
>> @@ -100,13 +106,17 @@ int init_iova_flush_queue(struct iova_do
>>  for_each_possible_cpu(cpu) {
>>  struct iova_fq *fq;
>>  
>> -fq = per_cpu_ptr(iovad->fq, cpu);
>> +fq = per_cpu_ptr(queue, cpu);
>>  fq->head = 0;
>>  fq->tail = 0;
>>  
>>  spin_lock_init(>lock);
>>  }
>>  
>> +smp_wmb();
>> +
>> +iovad->fq = queue;
>> +
> 
> Could we have a comment why the barrier is needed,

I'm up for the comment if you feel like it - in my POV it's quite
obvious that we want finish initializing the queue's internals before
assigning the queue. I didn't put the comment exactly because I felt
like it would state something evident [in my POV].

> and perhaps there
> should be oposing smp_rmb() somewhere? Does this need to be
> WRITE_ONCE() as it is racing against reader?

I feel confused. I might have forgotten everything about barriers, but
again if I'm not mistaken, one doesn't need a barrier in:
: if (A->a != NULL)
: use(A); /* dereferences A->a */
: else
: /* don't use `a' */

Thanks,
  Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH-4.14-stable 0/2] iommu/vt-d: queue_iova() boot crash backport

2019-07-31 Thread Dmitry Safonov via iommu
Backport commits from master that fix boot failure on some intel
machines.

I have only boot tested this in a VM. Functional testing for v4.14 is
out of my scope as patches differ only on a trivial conflict from v4.19,
where I discovered/debugged the issue. While testing v4.14 stable on
affected nodes would require porting some other [local] patches,
which is out of my timelimit for the backport.

Hopefully, that's fine.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Joerg Roedel 
Cc: Lu Baolu 

Dmitry Safonov (1):
  iommu/vt-d: Don't queue_iova() if there is no flush queue

Joerg Roedel (1):
  iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA

 drivers/iommu/intel-iommu.c |  2 +-
 drivers/iommu/iova.c| 18 ++
 include/linux/iova.h|  6 ++
 3 files changed, 21 insertions(+), 5 deletions(-)

-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH-4.14-stable 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue

2019-07-31 Thread Dmitry Safonov via iommu
[ Upstream commit effa467870c7612012885df4e246bdb8ffd8e44c ]

Intel VT-d driver was reworked to use common deferred flushing
implementation. Previously there was one global per-cpu flush queue,
afterwards - one per domain.

Before deferring a flush, the queue should be allocated and initialized.

Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
queue. It's probably worth to init it for static or unmanaged domains
too, but it may be arguable - I'm leaving it to iommu folks.

Prevent queuing an iova flush if the domain doesn't have a queue.
The defensive check seems to be worth to keep even if queue would be
initialized for all kinds of domains. And is easy backportable.

On 4.19.43 stable kernel it has a user-visible effect: previously for
devices in si domain there were crashes, on sata devices:

 BUG: spinlock bad magic on CPU#6, swapper/0/1
  lock: 0x88844f582008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x45/0x115
  intel_unmap+0x107/0x113
  intel_unmap_sg+0x6b/0x76
  __ata_qc_complete+0x7f/0x103
  ata_qc_complete+0x9b/0x26a
  ata_qc_complete_multiple+0xd0/0xe3
  ahci_handle_port_interrupt+0x3ee/0x48a
  ahci_handle_port_intr+0x73/0xa9
  ahci_single_level_irq_intr+0x40/0x60
  __handle_irq_event_percpu+0x7f/0x19a
  handle_irq_event_percpu+0x32/0x72
  handle_irq_event+0x38/0x56
  handle_edge_irq+0x102/0x121
  handle_irq+0x147/0x15c
  do_IRQ+0x66/0xf2
  common_interrupt+0xf/0xf
 RIP: 0010:__do_softirq+0x8c/0x2df

The same for usb devices that use ehci-pci:
 BUG: spinlock bad magic on CPU#0, swapper/0/1
  lock: 0x88844f402008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x77/0x145
  intel_unmap+0x107/0x113
  intel_unmap_page+0xe/0x10
  usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
  usb_hcd_unmap_urb_for_dma+0x17/0x100
  unmap_urb_for_dma+0x22/0x24
  __usb_hcd_giveback_urb+0x51/0xc3
  usb_giveback_urb_bh+0x97/0xde
  tasklet_action_common.isra.4+0x5f/0xa1
  tasklet_action+0x2d/0x30
  __do_softirq+0x138/0x2df
  irq_exit+0x7d/0x8b
  smp_apic_timer_interrupt+0x10f/0x151
  apic_timer_interrupt+0xf/0x20
  
 RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc:  # 4.14+
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Dmitry Safonov 
Reviewed-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
[v4.14-port notes:
o minor conflict with untrusted IOMMU devices check under if-condition
o setup_timer() near one chunk is timer_setup() in v5.3]
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c |  2 +-
 drivers/iommu/iova.c| 18 ++
 include/linux/iova.h|  6 ++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index baa4c58e2736..523d0889c2a4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3702,7 +3702,7 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
 
freelist = domain_unmap(domain, start_pfn, last_pfn);
 
-   if (intel_iommu_strict) {
+   if (intel_iommu_strict || !has_iova_flush_queue(>iovad)) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
  nrpages, !freelist, 0);
/* free iova */
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 33edfa794ae9..9f35b9a0d6d8 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -58,9 +58,14 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
+bool has_iova_flush_queue(struct iova_domain *iovad)
+{
+   return !!iovad->fq;
+}
+
 static void free_iova_flush_queue(struct iova_domain *iovad)
 {
-   if (!iovad->fq)
+   if (!has_iova_flush_queue(iovad))
return;
 
if (timer_pending(>fq_timer))
@@ -78,13 +83,14 @@ static void free_iova_flush_queue(struct iova_domain *iovad)
 int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor)
 {
+   struct iova_fq __percpu *queue;
int cpu;
 
atomic64_set(>fq_flush_start_cnt,  0);
atomic64_set(>fq_flush_finish_cnt, 0);
 
-   iovad->fq = alloc_percpu(struct iova_fq);
-   if (!iovad->fq)
+   queue = alloc_percpu(struct iova_fq);
+   if (!queue)
return -ENOMEM;
 
iovad->flush_cb   = flush_cb;
@@ -93,13 +99,17 @@ int init_iova_flush_queue(struct iova_domain *iov

[PATCH-4.19-stable 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue

2019-07-31 Thread Dmitry Safonov via iommu
[ Upstream commit effa467870c7612012885df4e246bdb8ffd8e44c ]

Intel VT-d driver was reworked to use common deferred flushing
implementation. Previously there was one global per-cpu flush queue,
afterwards - one per domain.

Before deferring a flush, the queue should be allocated and initialized.

Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
queue. It's probably worth to init it for static or unmanaged domains
too, but it may be arguable - I'm leaving it to iommu folks.

Prevent queuing an iova flush if the domain doesn't have a queue.
The defensive check seems to be worth to keep even if queue would be
initialized for all kinds of domains. And is easy backportable.

On 4.19.43 stable kernel it has a user-visible effect: previously for
devices in si domain there were crashes, on sata devices:

 BUG: spinlock bad magic on CPU#6, swapper/0/1
  lock: 0x88844f582008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x45/0x115
  intel_unmap+0x107/0x113
  intel_unmap_sg+0x6b/0x76
  __ata_qc_complete+0x7f/0x103
  ata_qc_complete+0x9b/0x26a
  ata_qc_complete_multiple+0xd0/0xe3
  ahci_handle_port_interrupt+0x3ee/0x48a
  ahci_handle_port_intr+0x73/0xa9
  ahci_single_level_irq_intr+0x40/0x60
  __handle_irq_event_percpu+0x7f/0x19a
  handle_irq_event_percpu+0x32/0x72
  handle_irq_event+0x38/0x56
  handle_edge_irq+0x102/0x121
  handle_irq+0x147/0x15c
  do_IRQ+0x66/0xf2
  common_interrupt+0xf/0xf
 RIP: 0010:__do_softirq+0x8c/0x2df

The same for usb devices that use ehci-pci:
 BUG: spinlock bad magic on CPU#0, swapper/0/1
  lock: 0x88844f402008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x77/0x145
  intel_unmap+0x107/0x113
  intel_unmap_page+0xe/0x10
  usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
  usb_hcd_unmap_urb_for_dma+0x17/0x100
  unmap_urb_for_dma+0x22/0x24
  __usb_hcd_giveback_urb+0x51/0xc3
  usb_giveback_urb_bh+0x97/0xde
  tasklet_action_common.isra.4+0x5f/0xa1
  tasklet_action+0x2d/0x30
  __do_softirq+0x138/0x2df
  irq_exit+0x7d/0x8b
  smp_apic_timer_interrupt+0x10f/0x151
  apic_timer_interrupt+0xf/0x20
  
 RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc:  # 4.14+
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Dmitry Safonov 
Reviewed-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
[v4.14-port notes:
o minor conflict with untrusted IOMMU devices check under if-condition]
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c |  2 +-
 drivers/iommu/iova.c| 18 ++
 include/linux/iova.h|  6 ++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c1439019dd12..b9af2419006f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3721,7 +3721,7 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
 
freelist = domain_unmap(domain, start_pfn, last_pfn);
 
-   if (intel_iommu_strict) {
+   if (intel_iommu_strict || !has_iova_flush_queue(>iovad)) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
  nrpages, !freelist, 0);
/* free iova */
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 83fe2621effe..60348d707b99 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -65,9 +65,14 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
+bool has_iova_flush_queue(struct iova_domain *iovad)
+{
+   return !!iovad->fq;
+}
+
 static void free_iova_flush_queue(struct iova_domain *iovad)
 {
-   if (!iovad->fq)
+   if (!has_iova_flush_queue(iovad))
return;
 
if (timer_pending(>fq_timer))
@@ -85,13 +90,14 @@ static void free_iova_flush_queue(struct iova_domain *iovad)
 int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor)
 {
+   struct iova_fq __percpu *queue;
int cpu;
 
atomic64_set(>fq_flush_start_cnt,  0);
atomic64_set(>fq_flush_finish_cnt, 0);
 
-   iovad->fq = alloc_percpu(struct iova_fq);
-   if (!iovad->fq)
+   queue = alloc_percpu(struct iova_fq);
+   if (!queue)
return -ENOMEM;
 
iovad->flush_cb   = flush_cb;
@@ -100,13 +106,17 @@ int init_iova_flush_queue(struct iova_domain *iovad,
for_each_possible_cpu(cpu) {
str

[PATCH-4.19-stable 0/2] iommu/vt-d: queue_iova() boot crash backport

2019-07-31 Thread Dmitry Safonov via iommu
Backport commits from master that fix boot failure on some intel
machines.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Joerg Roedel 
Cc: Lu Baolu 

Dmitry Safonov (1):
  iommu/vt-d: Don't queue_iova() if there is no flush queue

Joerg Roedel (1):
  iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA

 drivers/iommu/intel-iommu.c |  2 +-
 drivers/iommu/iova.c| 18 ++
 include/linux/iova.h|  6 ++
 3 files changed, 21 insertions(+), 5 deletions(-)

-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Patch "iommu/vt-d: Don't queue_iova() if there is no flush queue" has been added to the 5.2-stable tree

2019-07-29 Thread Dmitry Safonov via iommu
Hi Greg,

On 7/29/19 5:30 PM, gre...@linuxfoundation.org wrote:
> 
> This is a note to let you know that I've just added the patch titled
> 
> iommu/vt-d: Don't queue_iova() if there is no flush queue
> 
> to the 5.2-stable tree which can be found at:
> 
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> 
> The filename of the patch is:
>  iommu-vt-d-don-t-queue_iova-if-there-is-no-flush-queue.patch
> and it can be found in the queue-5.2 subdirectory.
> 
> If you, or anyone else, feels it should not be added to the stable tree,
> please let  know about it.

Please, make sure to apply the following patch too (kudos to Joerg).
[Sorry for any inconvenience cause by my copy-n-paste mistake]

commit 201c1db90cd6
Author: Joerg Roedel 
Date:   Tue Jul 23 09:51:00 2019 +0200

iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA

The stub function for !CONFIG_IOMMU_IOVA needs to be
'static inline'.

Fixes: effa467870c76 ('iommu/vt-d: Don't queue_iova() if there is no
flush queue')
Signed-off-by: Joerg Roedel 


Thanks,
  Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue

2019-07-23 Thread Dmitry Safonov via iommu
On Tue 23 Jul 2019, 9:17 a.m. Joerg Roedel,  wrote:

> On Tue, Jul 16, 2019 at 10:38:05PM +0100, Dmitry Safonov wrote:
>
> > @@ -235,6 +236,11 @@ static inline void init_iova_domain(struct
> iova_domain *iovad,
> >  {
> >  }
> >
> > +bool has_iova_flush_queue(struct iova_domain *iovad)
> > +{
> > + return false;
> > +}
> > +
>
> This needs to be 'static inline', I queued a patch on-top of my fixes
> branch.


Ugh, copy-n-paste declaration.

Thanks much,
Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/2] iommu/vt-d: Check if domain->pgd was allocated

2019-07-16 Thread Dmitry Safonov
There is a couple of places where on domain_init() failure domain_exit()
is called. While currently domain_init() can fail only if
alloc_pgtable_page() has failed.

Make domain_exit() check if domain->pgd present, before calling
domain_unmap(), as it theoretically should crash on clearing pte entries
in dma_pte_clear_level().

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6d1510284d21..698cc40355ef 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1835,7 +1835,6 @@ static inline int guestwidth_to_adjustwidth(int gaw)
 
 static void domain_exit(struct dmar_domain *domain)
 {
-   struct page *freelist;
 
/* Remove associated devices and clear attached or cached domains */
domain_remove_dev_info(domain);
@@ -1843,9 +1842,12 @@ static void domain_exit(struct dmar_domain *domain)
/* destroy iovas */
put_iova_domain(>iovad);
 
-   freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw));
+   if (domain->pgd) {
+   struct page *freelist;
 
-   dma_free_pagelist(freelist);
+   freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw));
+   dma_free_pagelist(freelist);
+   }
 
free_domain_mem(domain);
 }
-- 
2.22.0



[PATCH 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue

2019-07-16 Thread Dmitry Safonov
Intel VT-d driver was reworked to use common deferred flushing
implementation. Previously there was one global per-cpu flush queue,
afterwards - one per domain.

Before deferring a flush, the queue should be allocated and initialized.

Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
queue. It's probably worth to init it for static or unmanaged domains
too, but it may be arguable - I'm leaving it to iommu folks.

Prevent queuing an iova flush if the domain doesn't have a queue.
The defensive check seems to be worth to keep even if queue would be
initialized for all kinds of domains. And is easy backportable.

On 4.19.43 stable kernel it has a user-visible effect: previously for
devices in si domain there were crashes, on sata devices:

 BUG: spinlock bad magic on CPU#6, swapper/0/1
  lock: 0x88844f582008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x45/0x115
  intel_unmap+0x107/0x113
  intel_unmap_sg+0x6b/0x76
  __ata_qc_complete+0x7f/0x103
  ata_qc_complete+0x9b/0x26a
  ata_qc_complete_multiple+0xd0/0xe3
  ahci_handle_port_interrupt+0x3ee/0x48a
  ahci_handle_port_intr+0x73/0xa9
  ahci_single_level_irq_intr+0x40/0x60
  __handle_irq_event_percpu+0x7f/0x19a
  handle_irq_event_percpu+0x32/0x72
  handle_irq_event+0x38/0x56
  handle_edge_irq+0x102/0x121
  handle_irq+0x147/0x15c
  do_IRQ+0x66/0xf2
  common_interrupt+0xf/0xf
 RIP: 0010:__do_softirq+0x8c/0x2df

The same for usb devices that use ehci-pci:
 BUG: spinlock bad magic on CPU#0, swapper/0/1
  lock: 0x88844f402008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x77/0x145
  intel_unmap+0x107/0x113
  intel_unmap_page+0xe/0x10
  usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
  usb_hcd_unmap_urb_for_dma+0x17/0x100
  unmap_urb_for_dma+0x22/0x24
  __usb_hcd_giveback_urb+0x51/0xc3
  usb_giveback_urb_bh+0x97/0xde
  tasklet_action_common.isra.4+0x5f/0xa1
  tasklet_action+0x2d/0x30
  __do_softirq+0x138/0x2df
  irq_exit+0x7d/0x8b
  smp_apic_timer_interrupt+0x10f/0x151
  apic_timer_interrupt+0xf/0x20
  
 RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc:  # 4.14+
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c |  3 ++-
 drivers/iommu/iova.c| 18 ++
 include/linux/iova.h|  6 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ac4172c02244..6d1510284d21 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3564,7 +3564,8 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
 
freelist = domain_unmap(domain, start_pfn, last_pfn);
 
-   if (intel_iommu_strict || (pdev && pdev->untrusted)) {
+   if (intel_iommu_strict || (pdev && pdev->untrusted) ||
+   !has_iova_flush_queue(>iovad)) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
  nrpages, !freelist, 0);
/* free iova */
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index d499b2621239..8413ae54904a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -54,9 +54,14 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
+bool has_iova_flush_queue(struct iova_domain *iovad)
+{
+   return !!iovad->fq;
+}
+
 static void free_iova_flush_queue(struct iova_domain *iovad)
 {
-   if (!iovad->fq)
+   if (!has_iova_flush_queue(iovad))
return;
 
if (timer_pending(>fq_timer))
@@ -74,13 +79,14 @@ static void free_iova_flush_queue(struct iova_domain *iovad)
 int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor)
 {
+   struct iova_fq __percpu *queue;
int cpu;
 
atomic64_set(>fq_flush_start_cnt,  0);
atomic64_set(>fq_flush_finish_cnt, 0);
 
-   iovad->fq = alloc_percpu(struct iova_fq);
-   if (!iovad->fq)
+   queue = alloc_percpu(struct iova_fq);
+   if (!queue)
return -ENOMEM;
 
iovad->flush_cb   = flush_cb;
@@ -89,13 +95,17 @@ int init_iova_flush_queue(struct iova_domain *iovad,
for_each_possible_cpu(cpu) {
struct iova_fq *fq;
 
-   fq = per_cpu_ptr(iovad->fq, cpu);
+   fq = per_cpu_ptr(queue, cp

Re: [PATCH 3/4] iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be allocated

2019-03-07 Thread Dmitry Safonov via iommu
On 3/4/19 3:46 PM, James Sewart wrote:
> +static inline int domain_is_initialised(struct dmar_domain *domain)
> +{
> + return domain->flags & DOMAIN_FLAG_INITIALISED;
> +}

Maybe check it in intel_iommu_domain_free(), eh?

Thanks,
  Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/3] iommu/iova: Unsafe locking in find_iova()

2018-07-09 Thread Dmitry Safonov via iommu
On Fri, 2018-07-06 at 17:13 +0200, Joerg Roedel wrote:
> On Fri, Jul 06, 2018 at 03:10:47PM +0100, Dmitry Safonov wrote:
> > Yes, as far as I can see, there are code-paths which may try to
> > handle
> > it at the same time:
> > o memory notifiers for hot-unplug (intel-iommu.c)
> > o drivers unloading calls free_iova(), which in the result calls
> >   find_iova()
> > o I see at least one driver that frees iova during it's normal work
> >   too: scif_rma.c:scif_free_window_offset()
> 
> Yeah, but the IOVAs freed in the memory notifiers are just the ones
> for
> the direct-mapped RMRR regions requested by firmware, not the IOVAs
> allocated by any driver, so I think this shouldn't be a problem.

Ok, than drop it, please.
I thought that safer API + nice diffstat can justify the change, but
whatever %)

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/3] iommu/iova: Unsafe locking in find_iova()

2018-07-06 Thread Dmitry Safonov via iommu
On Fri, 2018-07-06 at 15:16 +0200, Joerg Roedel wrote:
> On Thu, Jun 21, 2018 at 07:08:20PM +0100, Dmitry Safonov wrote:
> > find_iova() looks to be using a bad locking practice: it locks the
> > returned iova only for the search time.  And looking in code, the
> > element can be removed from the tree and freed under rbtree lock.
> > That
> > happens during memory hot-unplug and cleanup on module
> > removal.  Here
> > I cleanup users of the function and delete it.
> 
> But this is only a problem if more than one code-path uses tries to
> handle a given iova at the same time, no?

Yes, as far as I can see, there are code-paths which may try to handle
it at the same time:
o memory notifiers for hot-unplug (intel-iommu.c)
o drivers unloading calls free_iova(), which in the result calls
  find_iova()
o I see at least one driver that frees iova during it's normal work
  too: scif_rma.c:scif_free_window_offset()

So, I decided to fix the interface while it's not widely used instead
of all callers.
Looks worth for me even as it's all corner-cases like unplugging the
memory.

Anyway, just found it while some college wrote a debug sysfs interface
for iovas and used find_iova().
So, if you think it's not worth to change - that's fine for me, but I
thought I'll nip this in the bud, preventing other people to misuse it.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/3] iommu/iova: Unsafe locking in find_iova()

2018-07-04 Thread Dmitry Safonov
Hi,

Any opinion on that?
It looks to remove a source of possible issues and
has a nice diffstat.

2018-06-21 19:08 GMT+01:00 Dmitry Safonov :
> find_iova() looks to be using a bad locking practice: it locks the
> returned iova only for the search time.
> And looking in code, the element can be removed from the tree and freed
> under rbtree lock. That happens during memory hot-unplug and cleanup on
> module removal.
> Here I cleanup users of the function and delete it.
>
> Dmitry Safonov (3):
>   iommu/iova: Find and split iova under rbtree's lock
>   iommu/iova: Make free_iova() atomic
>   iommu/iova: Remove find_iova()
>
>  drivers/iommu/intel-iommu.c | 14 +++--
>  drivers/iommu/iova.c| 48 
> +
>  include/linux/iova.h| 17 
>  3 files changed, 25 insertions(+), 54 deletions(-)
>

Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 3/3] iommu/iova: Remove find_iova()

2018-06-21 Thread Dmitry Safonov via iommu
This function is potentially dangerous: nothing protects returned iova.
As there is no user in tree anymore, delete it.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/iova.c | 20 
 include/linux/iova.h |  7 ---
 2 files changed, 27 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 4c63d92afaf7..4a568e28a633 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -336,26 +336,6 @@ static void private_free_iova(struct iova_domain *iovad, 
struct iova *iova)
 }
 
 /**
- * find_iova - finds an iova for a given pfn
- * @iovad: - iova domain in question.
- * @pfn: - page frame number
- * This function finds and returns an iova belonging to the
- * given doamin which matches the given pfn.
- */
-struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
-{
-   unsigned long flags;
-   struct iova *iova;
-
-   /* Take the lock so that no other thread is manipulating the rbtree */
-   spin_lock_irqsave(>iova_rbtree_lock, flags);
-   iova = private_find_iova(iovad, pfn);
-   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-   return iova;
-}
-EXPORT_SYMBOL_GPL(find_iova);
-
-/**
  * __free_iova - frees the given iova
  * @iovad: iova domain in question.
  * @iova: iova in question.
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 803472b77919..006911306a84 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -158,7 +158,6 @@ void init_iova_domain(struct iova_domain *iovad, unsigned 
long granule,
unsigned long start_pfn);
 int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor);
-struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
 void put_iova_domain(struct iova_domain *iovad);
 struct iova *iova_split_and_pop(struct iova_domain *iovad,
unsigned long pfn_lo, unsigned long pfn_hi);
@@ -243,12 +242,6 @@ static inline int init_iova_flush_queue(struct iova_domain 
*iovad,
return -ENODEV;
 }
 
-static inline struct iova *find_iova(struct iova_domain *iovad,
-unsigned long pfn)
-{
-   return NULL;
-}
-
 static inline void put_iova_domain(struct iova_domain *iovad)
 {
 }
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 2/3] iommu/iova: Make free_iova() atomic

2018-06-21 Thread Dmitry Safonov via iommu
find_iova() grabs rbtree's spinlock only for the search time.
Nothing guaranties that returned iova still exist for __free_iova().
Prevent potential use-after-free and double-free by holding the spinlock
all the time iova is being searched and freed.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/iova.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 4b38eb507670..4c63d92afaf7 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -382,11 +382,14 @@ EXPORT_SYMBOL_GPL(__free_iova);
 void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
-   struct iova *iova = find_iova(iovad, pfn);
+   unsigned long flags;
+   struct iova *iova;
 
+   spin_lock_irqsave(>iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn);
if (iova)
-   __free_iova(iovad, iova);
-
+   private_free_iova(iovad, iova);
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
 }
 EXPORT_SYMBOL_GPL(free_iova);
 
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 1/3] iommu/iova: Find and split iova under rbtree's lock

2018-06-21 Thread Dmitry Safonov via iommu
find_iova() holds iova_rbtree_lock only during the traversing rbtree.
After the lock is released, returned iova may be freed (e.g., during
module's release).
Hold the spinlock during search and removal of iova from the rbtree,
eleminating possible use-after-free or/and double-free of iova.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c | 14 +++---
 drivers/iommu/iova.c| 19 ---
 include/linux/iova.h| 10 --
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 14e4b3722428..494394ef0f92 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4530,19 +4530,11 @@ static int intel_iommu_memory_notifier(struct 
notifier_block *nb,
struct intel_iommu *iommu;
struct page *freelist;
 
-   iova = find_iova(_domain->iovad, start_vpfn);
+   iova = iova_split_and_pop(_domain->iovad, 
start_vpfn, last_vpfn);
if (iova == NULL) {
-   pr_debug("Failed get IOVA for PFN %lx\n",
-start_vpfn);
-   break;
-   }
-
-   iova = split_and_remove_iova(_domain->iovad, iova,
-start_vpfn, last_vpfn);
-   if (iova == NULL) {
-   pr_warn("Failed to split IOVA PFN [%lx-%lx]\n",
+   pr_warn("Failed to split & pop IOVA PFN 
[%lx-%lx]\n",
start_vpfn, last_vpfn);
-   return NOTIFY_BAD;
+   break;
}
 
freelist = domain_unmap(si_domain, iova->pfn_lo,
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 83fe2621effe..4b38eb507670 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -715,23 +715,27 @@ copy_reserved_iova(struct iova_domain *from, struct 
iova_domain *to)
 }
 EXPORT_SYMBOL_GPL(copy_reserved_iova);
 
-struct iova *
-split_and_remove_iova(struct iova_domain *iovad, struct iova *iova,
+struct iova *iova_split_and_pop(struct iova_domain *iovad,
  unsigned long pfn_lo, unsigned long pfn_hi)
 {
-   unsigned long flags;
struct iova *prev = NULL, *next = NULL;
+   unsigned long flags;
+   struct iova *iova;
 
spin_lock_irqsave(>iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn_lo);
+   if (iova == NULL)
+   goto err_unlock;
+
if (iova->pfn_lo < pfn_lo) {
prev = alloc_and_init_iova(iova->pfn_lo, pfn_lo - 1);
if (prev == NULL)
-   goto error;
+   goto err_unlock;
}
if (iova->pfn_hi > pfn_hi) {
next = alloc_and_init_iova(pfn_hi + 1, iova->pfn_hi);
if (next == NULL)
-   goto error;
+   goto err_free;
}
 
__cached_rbnode_delete_update(iovad, iova);
@@ -749,10 +753,11 @@ split_and_remove_iova(struct iova_domain *iovad, struct 
iova *iova,
 
return iova;
 
-error:
-   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+err_free:
if (prev)
free_iova_mem(prev);
+err_unlock:
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
return NULL;
 }
 
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 928442dda565..803472b77919 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -160,8 +160,8 @@ int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor);
 struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
 void put_iova_domain(struct iova_domain *iovad);
-struct iova *split_and_remove_iova(struct iova_domain *iovad,
-   struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi);
+struct iova *iova_split_and_pop(struct iova_domain *iovad,
+   unsigned long pfn_lo, unsigned long pfn_hi);
 void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
 #else
 static inline int iova_cache_get(void)
@@ -253,10 +253,8 @@ static inline void put_iova_domain(struct iova_domain 
*iovad)
 {
 }
 
-static inline struct iova *split_and_remove_iova(struct iova_domain *iovad,
-struct iova *iova,
-unsigned long pfn_lo,
-unsigned long pfn_hi)
+static inline struct iova *iova_split_and_pop(struct

[RFC 0/3] iommu/iova: Unsafe locking in find_iova()

2018-06-21 Thread Dmitry Safonov via iommu
find_iova() looks to be using a bad locking practice: it locks the
returned iova only for the search time.
And looking in code, the element can be removed from the tree and freed
under rbtree lock. That happens during memory hot-unplug and cleanup on
module removal.
Here I cleanup users of the function and delete it.

Dmitry Safonov (3):
  iommu/iova: Find and split iova under rbtree's lock
  iommu/iova: Make free_iova() atomic
  iommu/iova: Remove find_iova()

 drivers/iommu/intel-iommu.c | 14 +++--
 drivers/iommu/iova.c| 48 +
 include/linux/iova.h| 17 
 3 files changed, 25 insertions(+), 54 deletions(-)

-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 1/2] iommu/vt-d: Ratelimit each dmar fault printing

2018-05-03 Thread Dmitry Safonov via iommu
On Thu, 2018-05-03 at 14:40 +0200, Joerg Roedel wrote:
> On Wed, May 02, 2018 at 03:22:24AM +0100, Dmitry Safonov wrote:
> > Hi Joerg,
> > 
> > is there anything I may do about those two patches?
> > In 2/2 I've limited loop cnt as discussed in v3.
> > This one solves softlockup for us, might be useful.
> 
> Applied the first patch, thanks. Please re-work the second one
> according
> to the comments.

Will do.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler

2018-05-02 Thread Dmitry Safonov via iommu
On Thu, 2018-05-03 at 10:16 +0800, Lu Baolu wrote:
> Hi,
> 
> On 05/03/2018 09:59 AM, Dmitry Safonov wrote:
> > On Thu, 2018-05-03 at 09:32 +0800, Lu Baolu wrote:
> > > Hi,
> > > 
> > > On 05/03/2018 08:52 AM, Dmitry Safonov wrote:
> > > > AFAICS, we're doing fault-clearing in a loop inside irq
> > > > handler.
> > > > That means that while we're clearing if a fault raises, it'll
> > > > make
> > > > an irq level triggered (or on edge) on lapic. So, whenever we
> > > > return
> > > > from the irq handler, irq will raise again.
> > > > 
> > > 
> > > Uhm, double checked with the spec. Interrupts should be generated
> > > since we always clear the fault overflow bit.
> > > 
> > > Anyway, we can't clear faults in a limited loop, as the spec says
> > > in
> > > 7.3.1:
> > 
> > Mind to elaborate?
> > ITOW, I do not see a contradiction. We're still clearing faults in
> > FIFO
> > fashion. There is no limitation to do some spare work in between
> > clearings (return from interrupt, then fault again and continue).
> 
> Hardware maintains an internal index to reference the fault recording
> register in which the next fault can be recorded. When a fault comes,
> hardware will check the Fault bit (bit 31 of the 4th 32-bit register
> recording
> register) referenced by the internal index. If this bit is set,
> hardware will
> not record the fault.
> 
> Since we now don't clear the F bit until a register entry which has
> the F bit
> cleared, we might exit the fault handling with some register entries
> still
> have the F bit set.
> 
>   F
> > 0 |  x|
> > 0 |  x|
> > 0 |  x|  <--- Fault record index in fault status
> > register
> > 0 |  x|
> > 1 |  x|  <--- hardware maintained index
> > 1 |  x|
> > 1 |  x|
> > 0 |  x|
> > 0 |  x|
> > 0 |  x|
> > 0 |  x|
> 
> Take an example as above, hardware could only record 2 more faults
> with
> others all dropped.

Ugh, yeah, I got what you're saying.. Thanks for explanations.
So, we shouldn't mark faults as cleared until we've actually processed
them here:
:writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO,
:   iommu->reg + DMAR_FSTS_REG);

As Joerg mentioned, we do care about latency here, so this fault work
can't be moved entirely into workqueue.. but we might limit loop and
check if we've hit the limit - to proceed servicing faults in a wq,
as in that case we should care about being too long in irq-disabled
section more than about latencies.
Does that makes any sense, what do you think?

I can possibly re-write 2/2 with idea above..
And it would be a bit joy to have 1/1 applied, as it's independent fix
and fixes an issue that happens for real on our devices, heh.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler

2018-05-02 Thread Dmitry Safonov via iommu
On Thu, 2018-05-03 at 09:32 +0800, Lu Baolu wrote:
> Hi,
> 
> On 05/03/2018 08:52 AM, Dmitry Safonov wrote:
> > AFAICS, we're doing fault-clearing in a loop inside irq handler.
> > That means that while we're clearing if a fault raises, it'll make
> > an irq level triggered (or on edge) on lapic. So, whenever we
> > return
> > from the irq handler, irq will raise again.
> > 
> 
> Uhm, double checked with the spec. Interrupts should be generated
> since we always clear the fault overflow bit.
> 
> Anyway, we can't clear faults in a limited loop, as the spec says in
> 7.3.1:

Mind to elaborate?
ITOW, I do not see a contradiction. We're still clearing faults in FIFO
fashion. There is no limitation to do some spare work in between
clearings (return from interrupt, then fault again and continue).

> Software is expected to process the non-recoverable faults reported
> through the Fault Recording
> Registers in a circular FIFO fashion starting from the Fault
> Recording Register referenced by the Fault
> Recording Index (FRI) field, until it finds a Fault Recording
> Register with no faults (F field Clear).
> 
> Best regards,
> Lu Baolu

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler

2018-05-02 Thread Dmitry Safonov via iommu
On Thu, 2018-05-03 at 07:49 +0800, Lu Baolu wrote:
> Hi,
> 
> On 05/02/2018 08:38 PM, Dmitry Safonov wrote:
> > Hi Lu,
> > 
> > On Wed, 2018-05-02 at 14:34 +0800, Lu Baolu wrote:
> > > Hi,
> > > 
> > > On 03/31/2018 08:33 AM, Dmitry Safonov wrote:
> > > > Theoretically, on some machines faults might be generated
> > > > faster
> > > > than
> > > > they're cleared by CPU.
> > > 
> > > Is this a real case?
> > 
> > No. 1/2 is a real case and this one was discussed on v3:
> > lkml.kernel.org/r/<20180215191729.15777-1-d...@arista.com>
> > 
> > It's not possible on my hw as far as I tried, but the discussion
> > result
> > was to fix this theoretical issue too.
> 
> If faults are generated faster than CPU can clear them, the PCIe
> device should be in a very very bad state. How about disabling
> the PCIe device and ask the administrator to replace it? Anyway,
> I don't think that's goal of this patch series. :-)

Uhm, yeah, my point is not about the number of faults, but about
physical ability of iommu to generate faults faster than cpu processes
them. I might be wrong that it's not possible (like low cpu freq?)

But the number of interrupts might be high. It's like you've many
mappings on iommu and PCIe device went off. It could be just a link
flap. I think it makes sense not lockup on such occasions.

> > > >  Let's limit the cleaning-loop by number of hw
> > > > fault registers.
> > > 
> > > Will this cause the fault recording registers full of faults,
> > > hence
> > > new faults will be dropped without logging?
> > 
> > If faults come faster then they're being cleared - some of them
> > will be
> > dropped without logging. Not sure if it's worth to report all
> > faults in
> > such theoretical(!) situation.
> > If amount of reported faults for such situation is not enough and
> > it's
> > worth to keep all the faults, then probably we should introduce a
> > workqueue here (which I did in v1, but it was rejected by the
> > reason
> > that it will introduce some latency in fault reporting).
> > 
> > > And even worse, new faults will not generate interrupts?
> > 
> > They will, we clear page fault overflow outside of the loop, so any
> > new
> > fault will raise interrupt, iiuc.
> > 
> 
> I am afraid that they might not generate interrupts any more.
> 
> Say, the fault registers are full of events that are not cleared,
> then a new fault comes. There is no room for this event and
> hence the hardware might drop it silently.

AFAICS, we're doing fault-clearing in a loop inside irq handler.
That means that while we're clearing if a fault raises, it'll make
an irq level triggered (or on edge) on lapic. So, whenever we return
from the irq handler, irq will raise again.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler

2018-05-02 Thread Dmitry Safonov via iommu
Hi Lu,

On Wed, 2018-05-02 at 14:34 +0800, Lu Baolu wrote:
> Hi,
> 
> On 03/31/2018 08:33 AM, Dmitry Safonov wrote:
> > Theoretically, on some machines faults might be generated faster
> > than
> > they're cleared by CPU.
> 
> Is this a real case?

No. 1/2 is a real case and this one was discussed on v3:
lkml.kernel.org/r/<20180215191729.15777-1-d...@arista.com>

It's not possible on my hw as far as I tried, but the discussion result
was to fix this theoretical issue too.

> 
> >  Let's limit the cleaning-loop by number of hw
> > fault registers.
> 
> Will this cause the fault recording registers full of faults, hence
> new faults will be dropped without logging?

If faults come faster then they're being cleared - some of them will be
dropped without logging. Not sure if it's worth to report all faults in
such theoretical(!) situation.
If amount of reported faults for such situation is not enough and it's
worth to keep all the faults, then probably we should introduce a
workqueue here (which I did in v1, but it was rejected by the reason
that it will introduce some latency in fault reporting).

> And even worse, new faults will not generate interrupts?

They will, we clear page fault overflow outside of the loop, so any new
fault will raise interrupt, iiuc.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 1/2] iommu/vt-d: Ratelimit each dmar fault printing

2018-05-01 Thread Dmitry Safonov via iommu
Hi Joerg,

is there anything I may do about those two patches?
In 2/2 I've limited loop cnt as discussed in v3.
This one solves softlockup for us, might be useful.

On Sat, 2018-03-31 at 01:33 +0100, Dmitry Safonov wrote:
> There is a ratelimit for printing, but it's incremented each time the
> cpu recives dmar fault interrupt. While one interrupt may signal
> about
> *many* faults.
> So, measuring the impact it turns out that reading/clearing one fault
> takes < 1 usec, and printing info about the fault takes ~170 msec.
> 
> Having in mind that maximum number of fault recording registers per
> remapping hardware unit is 256.. IRQ handler may run for (170*256)
> msec.
> And as fault-serving loop runs without a time limit, during servicing
> new faults may occur..
> 
> Ratelimit each fault printing rather than each irq printing.
> 
> Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler")
> 
> BUG: spinlock lockup suspected on CPU#0, CliShell/9903
>  lock: 0x81a47440, .magic: dead4ead, .owner:
> kworker/u16:2/8915, .owner_cpu: 6
> CPU: 0 PID: 9903 Comm: CliShell
> Call Trace:$\n'
> [..] dump_stack+0x65/0x83$\n'
> [..] spin_dump+0x8f/0x94$\n'
> [..] do_raw_spin_lock+0x123/0x170$\n'
> [..] _raw_spin_lock_irqsave+0x32/0x3a$\n'
> [..] uart_chars_in_buffer+0x20/0x4d$\n'
> [..] tty_chars_in_buffer+0x18/0x1d$\n'
> [..] n_tty_poll+0x1cb/0x1f2$\n'
> [..] tty_poll+0x5e/0x76$\n'
> [..] do_select+0x363/0x629$\n'
> [..] compat_core_sys_select+0x19e/0x239$\n'
> [..] compat_SyS_select+0x98/0xc0$\n'
> [..] sysenter_dispatch+0x7/0x25$\n'
> [..]
> NMI backtrace for cpu 6
> CPU: 6 PID: 8915 Comm: kworker/u16:2
> Workqueue: dmar_fault dmar_fault_work
> Call Trace:$\n'
> [..] wait_for_xmitr+0x26/0x8f$\n'
> [..] serial8250_console_putchar+0x1c/0x2c$\n'
> [..] uart_console_write+0x40/0x4b$\n'
> [..] serial8250_console_write+0xe6/0x13f$\n'
> [..] call_console_drivers.constprop.13+0xce/0x103$\n'
> [..] console_unlock+0x1f8/0x39b$\n'
> [..] vprintk_emit+0x39e/0x3e6$\n'
> [..] printk+0x4d/0x4f$\n'
> [..] dmar_fault+0x1a8/0x1fc$\n'
> [..] dmar_fault_work+0x15/0x17$\n'
> [..] process_one_work+0x1e8/0x3a9$\n'
> [..] worker_thread+0x25d/0x345$\n'
> [..] kthread+0xea/0xf2$\n'
> [..] ret_from_fork+0x58/0x90$\n'
> 
> Cc: Alex Williamson <alex.william...@redhat.com>
> Cc: David Woodhouse <dw...@infradead.org>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: Joerg Roedel <j...@8bytes.org>
> Cc: Lu Baolu <baolu...@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Signed-off-by: Dmitry Safonov <d...@arista.com>
> ---
>  drivers/iommu/dmar.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index accf58388bdb..6c4ea32ee6a9 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
>   int reg, fault_index;
>   u32 fault_status;
>   unsigned long flag;
> - bool ratelimited;
>   static DEFINE_RATELIMIT_STATE(rs,
> DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
>  
> - /* Disable printing, simply clear the fault when ratelimited
> */
> - ratelimited = !__ratelimit();
> -
>   raw_spin_lock_irqsave(>register_lock, flag);
>   fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> - if (fault_status && !ratelimited)
> + if (fault_status && __ratelimit())
>   pr_err("DRHD: handling fault status reg %x\n",
> fault_status);
>  
>   /* TBD: ignore advanced fault log currently */
> @@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
>   fault_index = dma_fsts_fault_record_index(fault_status);
>   reg = cap_fault_reg_offset(iommu->cap);
>   while (1) {
> + /* Disable printing, simply clear the fault when
> ratelimited */
> + bool ratelimited = !__ratelimit();
>   u8 fault_reason;
>   u16 source_id;
>   u64 guest_addr;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv4 1/2] iommu/vt-d: Ratelimit each dmar fault printing

2018-03-30 Thread Dmitry Safonov via iommu
There is a ratelimit for printing, but it's incremented each time the
cpu recives dmar fault interrupt. While one interrupt may signal about
*many* faults.
So, measuring the impact it turns out that reading/clearing one fault
takes < 1 usec, and printing info about the fault takes ~170 msec.

Having in mind that maximum number of fault recording registers per
remapping hardware unit is 256.. IRQ handler may run for (170*256) msec.
And as fault-serving loop runs without a time limit, during servicing
new faults may occur..

Ratelimit each fault printing rather than each irq printing.

Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler")

BUG: spinlock lockup suspected on CPU#0, CliShell/9903
 lock: 0x81a47440, .magic: dead4ead, .owner: kworker/u16:2/8915, 
.owner_cpu: 6
CPU: 0 PID: 9903 Comm: CliShell
Call Trace:$\n'
[..] dump_stack+0x65/0x83$\n'
[..] spin_dump+0x8f/0x94$\n'
[..] do_raw_spin_lock+0x123/0x170$\n'
[..] _raw_spin_lock_irqsave+0x32/0x3a$\n'
[..] uart_chars_in_buffer+0x20/0x4d$\n'
[..] tty_chars_in_buffer+0x18/0x1d$\n'
[..] n_tty_poll+0x1cb/0x1f2$\n'
[..] tty_poll+0x5e/0x76$\n'
[..] do_select+0x363/0x629$\n'
[..] compat_core_sys_select+0x19e/0x239$\n'
[..] compat_SyS_select+0x98/0xc0$\n'
[..] sysenter_dispatch+0x7/0x25$\n'
[..]
NMI backtrace for cpu 6
CPU: 6 PID: 8915 Comm: kworker/u16:2
Workqueue: dmar_fault dmar_fault_work
Call Trace:$\n'
[..] wait_for_xmitr+0x26/0x8f$\n'
[..] serial8250_console_putchar+0x1c/0x2c$\n'
[..] uart_console_write+0x40/0x4b$\n'
[..] serial8250_console_write+0xe6/0x13f$\n'
[..] call_console_drivers.constprop.13+0xce/0x103$\n'
[..] console_unlock+0x1f8/0x39b$\n'
[..] vprintk_emit+0x39e/0x3e6$\n'
[..] printk+0x4d/0x4f$\n'
[..] dmar_fault+0x1a8/0x1fc$\n'
[..] dmar_fault_work+0x15/0x17$\n'
[..] process_one_work+0x1e8/0x3a9$\n'
[..] worker_thread+0x25d/0x345$\n'
[..] kthread+0xea/0xf2$\n'
[..] ret_from_fork+0x58/0x90$\n'

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: David Woodhouse <dw...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: Lu Baolu <baolu...@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index accf58388bdb..6c4ea32ee6a9 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
int reg, fault_index;
u32 fault_status;
unsigned long flag;
-   bool ratelimited;
static DEFINE_RATELIMIT_STATE(rs,
  DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /* Disable printing, simply clear the fault when ratelimited */
-   ratelimited = !__ratelimit();
-
raw_spin_lock_irqsave(>register_lock, flag);
fault_status = readl(iommu->reg + DMAR_FSTS_REG);
-   if (fault_status && !ratelimited)
+   if (fault_status && __ratelimit())
pr_err("DRHD: handling fault status reg %x\n", fault_status);
 
/* TBD: ignore advanced fault log currently */
@@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
fault_index = dma_fsts_fault_record_index(fault_status);
reg = cap_fault_reg_offset(iommu->cap);
while (1) {
+   /* Disable printing, simply clear the fault when ratelimited */
+   bool ratelimited = !__ratelimit();
u8 fault_reason;
u16 source_id;
u64 guest_addr;
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler

2018-03-30 Thread Dmitry Safonov via iommu
Theoretically, on some machines faults might be generated faster than
they're cleared by CPU. Let's limit the cleaning-loop by number of hw
fault registers.

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: David Woodhouse <dw...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: Lu Baolu <baolu...@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6c4ea32ee6a9..cf1105111209 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1615,7 +1615,7 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
 irqreturn_t dmar_fault(int irq, void *dev_id)
 {
struct intel_iommu *iommu = dev_id;
-   int reg, fault_index;
+   int reg, fault_index, i;
u32 fault_status;
unsigned long flag;
static DEFINE_RATELIMIT_STATE(rs,
@@ -1633,7 +1633,7 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 
fault_index = dma_fsts_fault_record_index(fault_status);
reg = cap_fault_reg_offset(iommu->cap);
-   while (1) {
+   for (i = 0; i < cap_num_fault_regs(iommu->cap); i++) {
/* Disable printing, simply clear the fault when ratelimited */
bool ratelimited = !__ratelimit();
u8 fault_reason;
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-29 Thread Dmitry Safonov
2018-03-29 9:50 GMT+01:00 Joerg Roedel <j...@8bytes.org>:
> On Tue, Mar 20, 2018 at 08:50:13PM +0000, Dmitry Safonov wrote:
>> Hmm, but this fixes my softlockup issue, because it's about time spent
>> in printk() inside irq-disabled section, rather about exiting the dmar-
>> clearing loop.
>> And on my hw doesn't make any difference to limit loop or not because
>> clearing a fault is much faster than hw could generate a new fault.
>> ITOW, it fixes the softlockup for me and the loop-related lockup can't
>> happen on hw I have (so it's the other issue, [possible?] on other hw).
>
> It might solve your issue, but someone else might still run into it with
> a different setup. An upstream fix needs to solve it for everyone.

Ok, I'll resend v4 with an additional patch to limit the loop.

Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-20 Thread Dmitry Safonov via iommu
On Thu, 2018-03-15 at 16:28 +0100, Joerg Roedel wrote:
> On Thu, Mar 15, 2018 at 02:42:00PM +0000, Dmitry Safonov wrote:
> > But even with loop-limit we will need ratelimit each printk()
> > *also*.
> > Otherwise loop-limit will be based on time spent printing, not on
> > anything else..
> > The patch makes sense even with loop-limit in my opinion.
> 
> Looks like I mis-read your patch, somehow it looked to me as if you
> replace all 'ratelimited' usages with a call to __ratelimit(), but
> you
> just move 'ratelimited' into the loop, which actually makes sense.

So, is it worth to apply the patch?

> But still, this alone is no proper fix for the soft-lockups you are
> seeing.

Hmm, but this fixes my softlockup issue, because it's about time spent
in printk() inside irq-disabled section, rather about exiting the dmar-
clearing loop.
And on my hw doesn't make any difference to limit loop or not because
clearing a fault is much faster than hw could generate a new fault.
ITOW, it fixes the softlockup for me and the loop-related lockup can't
happen on hw I have (so it's the other issue, [possible?] on other hw).

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Dmitry Safonov via iommu
On Thu, 2018-03-15 at 14:34 +, Dmitry Safonov wrote:
> On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote:
> > On Thu, Mar 15, 2018 at 02:13:03PM +0000, Dmitry Safonov wrote:
> > > So, you suggest to remove ratelimit at all?
> > > Do we really need printk flood for each happened fault?
> > > Imagine, you've hundreds of mappings and then PCI link flapped..
> > > Wouldn't it be better to keep ratelimiting?
> > > I don't mind, just it looks a bit strange to me.
> > 
> > I never said you should remove the ratelimiting, after all you are
> > trying to fix a soft-lockup, no?
> > 
> > And that should not be fixed by changes to the ratelimiting, but
> > with
> > proper irq handling.
> 
> Uh, I'm a bit confused then.
> - Isn't it better to ratelimit each printk() instead of bunch of
> printks inside irq handler?
> - I can limit the number of loops, but the most of the time is spent
> in
> the loop on printk() (on my machine ~170msec per loop), while
> everything else takes much lesser time (on my machine < 1 usec per
> loop). So, if I will limit number of loops per-irq, that cycle-limit
> will be based on limiting time spent on printk (e.g., how many
> printks
> to do in atomic context so that node will not lockup). It smells like
> ratelimiting, no?
> 
> I must be misunderstanding something, but why introducing another
> limit
> for number of printk() called when there is ratelimit which may be
> tuned..
> 

So I agree, that maybe better to have another limit to the cycle
*also*, because if we clean faults with the same speed as they're
generated by hw, we may stuck in the loop..
By on my measures clearing fault is so fast (< 1 usec), that I'm not
sure that it may happen with hw. By that reason I didn't introduce
loop-limit.

But even with loop-limit we will need ratelimit each printk() *also*.
Otherwise loop-limit will be based on time spent printing, not on
anything else..
The patch makes sense even with loop-limit in my opinion.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Dmitry Safonov via iommu
On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote:
> On Thu, Mar 15, 2018 at 02:13:03PM +0000, Dmitry Safonov wrote:
> > So, you suggest to remove ratelimit at all?
> > Do we really need printk flood for each happened fault?
> > Imagine, you've hundreds of mappings and then PCI link flapped..
> > Wouldn't it be better to keep ratelimiting?
> > I don't mind, just it looks a bit strange to me.
> 
> I never said you should remove the ratelimiting, after all you are
> trying to fix a soft-lockup, no?
> 
> And that should not be fixed by changes to the ratelimiting, but with
> proper irq handling.

Uh, I'm a bit confused then.
- Isn't it better to ratelimit each printk() instead of bunch of
printks inside irq handler?
- I can limit the number of loops, but the most of the time is spent in
the loop on printk() (on my machine ~170msec per loop), while
everything else takes much lesser time (on my machine < 1 usec per
loop). So, if I will limit number of loops per-irq, that cycle-limit
will be based on limiting time spent on printk (e.g., how many printks
to do in atomic context so that node will not lockup). It smells like
ratelimiting, no?

I must be misunderstanding something, but why introducing another limit
for number of printk() called when there is ratelimit which may be
tuned..

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Dmitry Safonov via iommu
On Thu, 2018-03-15 at 14:46 +0100, Joerg Roedel wrote:
> On Thu, Feb 15, 2018 at 07:17:29PM +0000, Dmitry Safonov wrote:
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index accf58388bdb..6c4ea32ee6a9 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void
> > *dev_id)
> > int reg, fault_index;
> > u32 fault_status;
> > unsigned long flag;
> > -   bool ratelimited;
> > static DEFINE_RATELIMIT_STATE(rs,
> >   DEFAULT_RATELIMIT_INTERVAL,
> >   DEFAULT_RATELIMIT_BURST);
> >  
> > -   /* Disable printing, simply clear the fault when
> > ratelimited */
> > -   ratelimited = !__ratelimit();
> > -
> > raw_spin_lock_irqsave(>register_lock, flag);
> > fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> > -   if (fault_status && !ratelimited)
> > +   if (fault_status && __ratelimit())
> > pr_err("DRHD: handling fault status reg %x\n",
> > fault_status);
> 
> This looks aweful. Have you tried to limit the number of loops in
> this
> function and returning? You can handle the next faults by the next
> interrupt. This ensures that the cpu visits a scheduling point from
> time
> to time so that you don't see soft-lockups.

So, you suggest to remove ratelimit at all?
Do we really need printk flood for each happened fault?
Imagine, you've hundreds of mappings and then PCI link flapped..
Wouldn't it be better to keep ratelimiting?
I don't mind, just it looks a bit strange to me.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-13 Thread Dmitry Safonov via iommu
Gentle ping?

On Mon, 2018-03-05 at 15:00 +, Dmitry Safonov wrote:
> Hi Joerg,
> 
> What do you think about v3?
> It looks like, I can solve my softlookups with just a bit more proper
> ratelimiting..
> 
> On Thu, 2018-02-15 at 19:17 +, Dmitry Safonov wrote:
> > There is a ratelimit for printing, but it's incremented each time
> > the
> > cpu recives dmar fault interrupt. While one interrupt may signal
> > about
> > *many* faults.
> > So, measuring the impact it turns out that reading/clearing one
> > fault
> > takes < 1 usec, and printing info about the fault takes ~170 msec.
> > 
> > Having in mind that maximum number of fault recording registers per
> > remapping hardware unit is 256.. IRQ handler may run for (170*256)
> > msec.
> > And as fault-serving loop runs without a time limit, during
> > servicing
> > new faults may occur..
> > 
> > Ratelimit each fault printing rather than each irq printing.
> > 
> > Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler")
> > 
> > BUG: spinlock lockup suspected on CPU#0, CliShell/9903
> >  lock: 0x81a47440, .magic: dead4ead, .owner:
> > kworker/u16:2/8915, .owner_cpu: 6
> > CPU: 0 PID: 9903 Comm: CliShell
> > Call Trace:$\n'
> > [..] dump_stack+0x65/0x83$\n'
> > [..] spin_dump+0x8f/0x94$\n'
> > [..] do_raw_spin_lock+0x123/0x170$\n'
> > [..] _raw_spin_lock_irqsave+0x32/0x3a$\n'
> > [..] uart_chars_in_buffer+0x20/0x4d$\n'
> > [..] tty_chars_in_buffer+0x18/0x1d$\n'
> > [..] n_tty_poll+0x1cb/0x1f2$\n'
> > [..] tty_poll+0x5e/0x76$\n'
> > [..] do_select+0x363/0x629$\n'
> > [..] compat_core_sys_select+0x19e/0x239$\n'
> > [..] compat_SyS_select+0x98/0xc0$\n'
> > [..] sysenter_dispatch+0x7/0x25$\n'
> > [..]
> > NMI backtrace for cpu 6
> > CPU: 6 PID: 8915 Comm: kworker/u16:2
> > Workqueue: dmar_fault dmar_fault_work
> > Call Trace:$\n'
> > [..] wait_for_xmitr+0x26/0x8f$\n'
> > [..] serial8250_console_putchar+0x1c/0x2c$\n'
> > [..] uart_console_write+0x40/0x4b$\n'
> > [..] serial8250_console_write+0xe6/0x13f$\n'
> > [..] call_console_drivers.constprop.13+0xce/0x103$\n'
> > [..] console_unlock+0x1f8/0x39b$\n'
> > [..] vprintk_emit+0x39e/0x3e6$\n'
> > [..] printk+0x4d/0x4f$\n'
> > [..] dmar_fault+0x1a8/0x1fc$\n'
> > [..] dmar_fault_work+0x15/0x17$\n'
> > [..] process_one_work+0x1e8/0x3a9$\n'
> > [..] worker_thread+0x25d/0x345$\n'
> > [..] kthread+0xea/0xf2$\n'
> > [..] ret_from_fork+0x58/0x90$\n'
> > 
> > Cc: Alex Williamson <alex.william...@redhat.com>
> > Cc: David Woodhouse <dw...@infradead.org>
> > Cc: Ingo Molnar <mi...@kernel.org>
> > Cc: Joerg Roedel <j...@8bytes.org>
> > Cc: Lu Baolu <baolu...@linux.intel.com>
> > Cc: iommu@lists.linux-foundation.org
> > Signed-off-by: Dmitry Safonov <d...@arista.com>
> > ---
> > Maybe it's worth to limit while(1) cycle.
> > If IOMMU generates faults with equal speed as irq handler cleans
> > them, it may turn into long-irq-disabled region again.
> > Not sure if it can happen anyway.
> > 
> >  drivers/iommu/dmar.c | 8 +++-
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index accf58388bdb..6c4ea32ee6a9 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void
> > *dev_id)
> > int reg, fault_index;
> > u32 fault_status;
> > unsigned long flag;
> > -   bool ratelimited;
> > static DEFINE_RATELIMIT_STATE(rs,
> >   DEFAULT_RATELIMIT_INTERVAL,
> >   DEFAULT_RATELIMIT_BURST);
> >  
> > -   /* Disable printing, simply clear the fault when
> > ratelimited
> > */
> > -   ratelimited = !__ratelimit();
> > -
> > raw_spin_lock_irqsave(>register_lock, flag);
> > fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> > -   if (fault_status && !ratelimited)
> > +   if (fault_status && __ratelimit())
> > pr_err("DRHD: handling fault status reg %x\n",
> > fault_status);
> >  
> > /* TBD: ignore advanced fault log currently */
> > @@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
> > fault_index = dma_fsts_fault_record_index(fault_status);
> > reg = cap_fault_reg_offset(iommu->cap);
> > while (1) {
> > +   /* Disable printing, simply clear the fault when
> > ratelimited */
> > +   bool ratelimited = !__ratelimit();
> > u8 fault_reason;
> > u16 source_id;
> > u64 guest_addr;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-05 Thread Dmitry Safonov via iommu
Hi Joerg,

What do you think about v3?
It looks like, I can solve my softlookups with just a bit more proper
ratelimiting..

On Thu, 2018-02-15 at 19:17 +, Dmitry Safonov wrote:
> There is a ratelimit for printing, but it's incremented each time the
> cpu recives dmar fault interrupt. While one interrupt may signal
> about
> *many* faults.
> So, measuring the impact it turns out that reading/clearing one fault
> takes < 1 usec, and printing info about the fault takes ~170 msec.
> 
> Having in mind that maximum number of fault recording registers per
> remapping hardware unit is 256.. IRQ handler may run for (170*256)
> msec.
> And as fault-serving loop runs without a time limit, during servicing
> new faults may occur..
> 
> Ratelimit each fault printing rather than each irq printing.
> 
> Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler")
> 
> BUG: spinlock lockup suspected on CPU#0, CliShell/9903
>  lock: 0x81a47440, .magic: dead4ead, .owner:
> kworker/u16:2/8915, .owner_cpu: 6
> CPU: 0 PID: 9903 Comm: CliShell
> Call Trace:$\n'
> [..] dump_stack+0x65/0x83$\n'
> [..] spin_dump+0x8f/0x94$\n'
> [..] do_raw_spin_lock+0x123/0x170$\n'
> [..] _raw_spin_lock_irqsave+0x32/0x3a$\n'
> [..] uart_chars_in_buffer+0x20/0x4d$\n'
> [..] tty_chars_in_buffer+0x18/0x1d$\n'
> [..] n_tty_poll+0x1cb/0x1f2$\n'
> [..] tty_poll+0x5e/0x76$\n'
> [..] do_select+0x363/0x629$\n'
> [..] compat_core_sys_select+0x19e/0x239$\n'
> [..] compat_SyS_select+0x98/0xc0$\n'
> [..] sysenter_dispatch+0x7/0x25$\n'
> [..]
> NMI backtrace for cpu 6
> CPU: 6 PID: 8915 Comm: kworker/u16:2
> Workqueue: dmar_fault dmar_fault_work
> Call Trace:$\n'
> [..] wait_for_xmitr+0x26/0x8f$\n'
> [..] serial8250_console_putchar+0x1c/0x2c$\n'
> [..] uart_console_write+0x40/0x4b$\n'
> [..] serial8250_console_write+0xe6/0x13f$\n'
> [..] call_console_drivers.constprop.13+0xce/0x103$\n'
> [..] console_unlock+0x1f8/0x39b$\n'
> [..] vprintk_emit+0x39e/0x3e6$\n'
> [..] printk+0x4d/0x4f$\n'
> [..] dmar_fault+0x1a8/0x1fc$\n'
> [..] dmar_fault_work+0x15/0x17$\n'
> [..] process_one_work+0x1e8/0x3a9$\n'
> [..] worker_thread+0x25d/0x345$\n'
> [..] kthread+0xea/0xf2$\n'
> [..] ret_from_fork+0x58/0x90$\n'
> 
> Cc: Alex Williamson <alex.william...@redhat.com>
> Cc: David Woodhouse <dw...@infradead.org>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: Joerg Roedel <j...@8bytes.org>
> Cc: Lu Baolu <baolu...@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Signed-off-by: Dmitry Safonov <d...@arista.com>
> ---
> Maybe it's worth to limit while(1) cycle.
> If IOMMU generates faults with equal speed as irq handler cleans
> them, it may turn into long-irq-disabled region again.
> Not sure if it can happen anyway.
> 
>  drivers/iommu/dmar.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index accf58388bdb..6c4ea32ee6a9 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
>   int reg, fault_index;
>   u32 fault_status;
>   unsigned long flag;
> - bool ratelimited;
>   static DEFINE_RATELIMIT_STATE(rs,
> DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
>  
> - /* Disable printing, simply clear the fault when ratelimited
> */
> - ratelimited = !__ratelimit();
> -
>   raw_spin_lock_irqsave(>register_lock, flag);
>   fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> - if (fault_status && !ratelimited)
> + if (fault_status && __ratelimit())
>   pr_err("DRHD: handling fault status reg %x\n",
> fault_status);
>  
>   /* TBD: ignore advanced fault log currently */
> @@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
>   fault_index = dma_fsts_fault_record_index(fault_status);
>   reg = cap_fault_reg_offset(iommu->cap);
>   while (1) {
> + /* Disable printing, simply clear the fault when
> ratelimited */
> + bool ratelimited = !__ratelimit();
>   u8 fault_reason;
>   u16 source_id;
>   u64 guest_addr;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-02-15 Thread Dmitry Safonov via iommu
There is a ratelimit for printing, but it's incremented each time the
cpu recives dmar fault interrupt. While one interrupt may signal about
*many* faults.
So, measuring the impact it turns out that reading/clearing one fault
takes < 1 usec, and printing info about the fault takes ~170 msec.

Having in mind that maximum number of fault recording registers per
remapping hardware unit is 256.. IRQ handler may run for (170*256) msec.
And as fault-serving loop runs without a time limit, during servicing
new faults may occur..

Ratelimit each fault printing rather than each irq printing.

Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler")

BUG: spinlock lockup suspected on CPU#0, CliShell/9903
 lock: 0x81a47440, .magic: dead4ead, .owner: kworker/u16:2/8915, 
.owner_cpu: 6
CPU: 0 PID: 9903 Comm: CliShell
Call Trace:$\n'
[..] dump_stack+0x65/0x83$\n'
[..] spin_dump+0x8f/0x94$\n'
[..] do_raw_spin_lock+0x123/0x170$\n'
[..] _raw_spin_lock_irqsave+0x32/0x3a$\n'
[..] uart_chars_in_buffer+0x20/0x4d$\n'
[..] tty_chars_in_buffer+0x18/0x1d$\n'
[..] n_tty_poll+0x1cb/0x1f2$\n'
[..] tty_poll+0x5e/0x76$\n'
[..] do_select+0x363/0x629$\n'
[..] compat_core_sys_select+0x19e/0x239$\n'
[..] compat_SyS_select+0x98/0xc0$\n'
[..] sysenter_dispatch+0x7/0x25$\n'
[..]
NMI backtrace for cpu 6
CPU: 6 PID: 8915 Comm: kworker/u16:2
Workqueue: dmar_fault dmar_fault_work
Call Trace:$\n'
[..] wait_for_xmitr+0x26/0x8f$\n'
[..] serial8250_console_putchar+0x1c/0x2c$\n'
[..] uart_console_write+0x40/0x4b$\n'
[..] serial8250_console_write+0xe6/0x13f$\n'
[..] call_console_drivers.constprop.13+0xce/0x103$\n'
[..] console_unlock+0x1f8/0x39b$\n'
[..] vprintk_emit+0x39e/0x3e6$\n'
[..] printk+0x4d/0x4f$\n'
[..] dmar_fault+0x1a8/0x1fc$\n'
[..] dmar_fault_work+0x15/0x17$\n'
[..] process_one_work+0x1e8/0x3a9$\n'
[..] worker_thread+0x25d/0x345$\n'
[..] kthread+0xea/0xf2$\n'
[..] ret_from_fork+0x58/0x90$\n'

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: David Woodhouse <dw...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: Lu Baolu <baolu...@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov <d...@arista.com>
---
Maybe it's worth to limit while(1) cycle.
If IOMMU generates faults with equal speed as irq handler cleans
them, it may turn into long-irq-disabled region again.
Not sure if it can happen anyway.

 drivers/iommu/dmar.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index accf58388bdb..6c4ea32ee6a9 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
int reg, fault_index;
u32 fault_status;
unsigned long flag;
-   bool ratelimited;
static DEFINE_RATELIMIT_STATE(rs,
  DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /* Disable printing, simply clear the fault when ratelimited */
-   ratelimited = !__ratelimit();
-
raw_spin_lock_irqsave(>register_lock, flag);
fault_status = readl(iommu->reg + DMAR_FSTS_REG);
-   if (fault_status && !ratelimited)
+   if (fault_status && __ratelimit())
pr_err("DRHD: handling fault status reg %x\n", fault_status);
 
/* TBD: ignore advanced fault log currently */
@@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
fault_index = dma_fsts_fault_record_index(fault_status);
reg = cap_fault_reg_offset(iommu->cap);
while (1) {
+   /* Disable printing, simply clear the fault when ratelimited */
+   bool ratelimited = !__ratelimit();
u8 fault_reason;
u16 source_id;
u64 guest_addr;
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue

2018-02-15 Thread Dmitry Safonov via iommu
On Tue, 2018-02-13 at 17:38 +, Dmitry Safonov wrote:
> On Tue, 2018-02-13 at 17:35 +0100, Joerg Roedel wrote:
> > On Mon, Feb 12, 2018 at 04:48:23PM +0000, Dmitry Safonov wrote:
> > > dmar_fault() reports/handles/cleans DMAR faults in a cycle one-
> > > by-
> > > one.
> > > The nuisance is that it's set as a irq handler and runs with
> > > disabled
> > > interrupts - which works OK if you have only a couple of DMAR
> > > faults,
> > > but becomes a problem if your intel iommu has a plenty of
> > > mappings.
> > 
> > I don't think that a work-queue is the right solution here, it adds
> > a
> > long delay until the log is processed. During that delay, and with
> > high
> > fault rates the error log will overflow during that delay.
> > 
> > Here is what I think you should do instead to fix the soft-lockups:
> > 
> > First, unmask the fault reporting irq so that you will get
> > subsequent
> > irqs. Then:
> > 
> > * For Primary Fault Reporting just cycle once through all
> >   supported fault recording registers.
> > 
> > * For Advanced Fault Reporting, read start and end pointer of
> >   the log and process all entries.
> > 
> > After that return from the fault handler and let the next irq
> > handle
> > additional faults that might have been recorded while the previous
> > handler was running.
> 
> Ok, will re-do this way, thanks.
> 
> > And of course, ratelimiting the fault printouts is always a good
> > idea.

I've looked at this more,
It turns out that fault handler is used only to printk() errors about
faults happened. So, we shouldn't care about added delay with wq, as
it's just another ratelimit for printing of faults.
I also measured how much time it takes to read/clean fault and how much
time it takes to print info about the fault.

Heh, anyway I'll resend v3 of 6/6 patch, which ratelimits printks by
each fault, rather by each irq - it'll work for me with only that patch
applied.

-- 
Dima
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue

2018-02-13 Thread Dmitry Safonov via iommu
On Tue, 2018-02-13 at 17:35 +0100, Joerg Roedel wrote:
> On Mon, Feb 12, 2018 at 04:48:23PM +0000, Dmitry Safonov wrote:
> > dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-
> > one.
> > The nuisance is that it's set as a irq handler and runs with
> > disabled
> > interrupts - which works OK if you have only a couple of DMAR
> > faults,
> > but becomes a problem if your intel iommu has a plenty of mappings.
> 
> I don't think that a work-queue is the right solution here, it adds a
> long delay until the log is processed. During that delay, and with
> high
> fault rates the error log will overflow during that delay.
> 
> Here is what I think you should do instead to fix the soft-lockups:
> 
> First, unmask the fault reporting irq so that you will get subsequent
> irqs. Then:
> 
>   * For Primary Fault Reporting just cycle once through all
> supported fault recording registers.
> 
>   * For Advanced Fault Reporting, read start and end pointer of
> the log and process all entries.
> 
> After that return from the fault handler and let the next irq handle
> additional faults that might have been recorded while the previous
> handler was running.

Ok, will re-do this way, thanks.

> And of course, ratelimiting the fault printouts is always a good
> idea.

-- 
Dima
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv2 5/6] iommu/intel: Rename dmar_fault() => dmar_serve_faults()

2018-02-12 Thread Dmitry Safonov via iommu
Fix the return value, parameters and a bit better naming.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c| 8 +++-
 drivers/iommu/intel-iommu.c | 2 +-
 drivers/iommu/intel_irq_remapping.c | 2 +-
 include/linux/dmar.h| 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 34a53aeede38..37a1147a7592 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1635,9 +1635,8 @@ static void clear_primary_faults(struct intel_iommu 
*iommu)
 }
 
 #define PRIMARY_FAULT_REG_LEN (16)
-irqreturn_t dmar_fault(int irq, void *dev_id)
+void dmar_serve_faults(struct intel_iommu *iommu)
 {
-   struct intel_iommu *iommu = dev_id;
int reg, fault_index;
u32 fault_status;
unsigned long flag;
@@ -1706,14 +1705,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 
 unlock_exit:
raw_spin_unlock_irqrestore(>register_lock, flag);
-   return IRQ_HANDLED;
 }
 
 static void dmar_fault_work(struct work_struct *work)
 {
struct fault *fault = container_of(work, struct fault, work);
 
-   dmar_fault(-1, (void*)fault->iommu);
+   dmar_serve_faults(fault->iommu);
 }
 
 static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
@@ -1786,7 +1784,7 @@ int __init enable_drhd_fault_handling(void)
/*
 * Clear any previous faults.
 */
-   dmar_fault(iommu->irq, iommu);
+   dmar_serve_faults(iommu);
fault_status = readl(iommu->reg + DMAR_FSTS_REG);
writel(fault_status, iommu->reg + DMAR_FSTS_REG);
}
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 64ad9786f5b9..3b55376b3b46 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3015,7 +3015,7 @@ static void intel_iommu_init_qi(struct intel_iommu *iommu)
/*
 * Clear any previous faults.
 */
-   dmar_fault(-1, iommu);
+   dmar_serve_faults(iommu);
/*
 * Disable queued invalidation if supported and already enabled
 * before OS handover.
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 66f69af2c219..0958fdc76564 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -558,7 +558,7 @@ static int intel_setup_irq_remapping(struct intel_iommu 
*iommu)
/*
 * Clear previous faults.
 */
-   dmar_fault(-1, iommu);
+   dmar_serve_faults(iommu);
dmar_disable_qi(iommu);
 
if (dmar_enable_qi(iommu)) {
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 5de4932a6ad8..6e9a5e4ac73a 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -280,7 +280,7 @@ extern void dmar_msi_mask(struct irq_data *data);
 extern void dmar_msi_read(int irq, struct msi_msg *msg);
 extern void dmar_msi_write(int irq, struct msi_msg *msg);
 extern int dmar_set_interrupt(struct intel_iommu *iommu);
-extern irqreturn_t dmar_fault(int irq, void *dev_id);
+extern void dmar_serve_faults(struct intel_iommu *iommu);
 extern int dmar_alloc_hwirq(int id, int node, void *arg);
 extern void dmar_free_hwirq(int irq);
 
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue

2018-02-12 Thread Dmitry Safonov via iommu
dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-one.
The nuisance is that it's set as a irq handler and runs with disabled
interrupts - which works OK if you have only a couple of DMAR faults,
but becomes a problem if your intel iommu has a plenty of mappings.

We have a test that checks flapping of PCI link. Once or twice it had
stuck on soft lockup (which is panic for the typical switch):

 dmar: DMAR:[DMA Write] Request device [4a:00.0] fault addr 5e1a3000
 DMAR:[fault reason 02] Present bit in context entry is clear
 NMI watchdog: BUG: soft lockup - CPU#8 stuck for 21s!
 CPU: 8 PID: 2774 Comm: SuperServer
 Call Trace:
  generic_exec_single+0x11b/0x12d
  ? flush_tlb_func+0x0/0x18a
  smp_call_function_single+0x89/0xb5
  smp_call_function_many+0xef/0x207
  native_flush_tlb_others+0x2e/0x30
  flush_tlb_mm_range+0x139/0x18a
  tlb_flush_mmu_tlbonly+0x35/0x85
  tlb_flush_mmu+0x13/0x1f
  tlb_finish_mmu+0x14/0x39
  unmap_region+0xda/0xec
  do_munmap+0x273/0x2f5
  vm_munmap+0x45/0x5e
  SyS_munmap+0x26/0x2f
  sysenter_dispatch+0x7/0x25

 Kernel panic - not syncing: softlockup: hung tasks
 sending NMI to all CPUs:
 NMI backtrace for cpu 0
 CPU: 0 PID: 0 Comm: swapper/0
 Call Trace:
  
  wait_for_xmitr+0x26/0x8f
  serial8250_console_putchar+0x1c/0x2c
  uart_console_write+0x40/0x4b
  serial8250_console_write+0xe6/0x13f
  call_console_drivers.constprop.13+0xce/0x103
  console_unlock+0x1f8/0x39b
  ? local_clock+0x21/0x23
  vprintk_emit+0x39e/0x3e6
  printk+0x4d/0x4f
  dmar_fault+0x1ab/0x1fd
  handle_irq_event_percpu+0x8a/0x1f5
  handle_irq_event+0x41/0x61
  handle_edge_irq+0xe1/0xfa
  handle_irq+0x155/0x166
  do_IRQ+0x5a/0xea
  ret_from_intr+0x0/0x15
  
  arch_cpu_idle+0xf/0x11
  cpu_startup_entry+0x22f/0x3cb
  rest_init+0x80/0x84
  start_kernel+0x470/0x47d
  x86_64_start_reservations+0x2a/0x2c
  x86_64_start_kernel+0x14b/0x15a

Move DMAR faults clearing out of irq-disabled critical section by
proceeding with that in a workqueue thread.
The next patch will correct the definition of dmar_fault().

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c| 48 -
 drivers/iommu/intel-iommu.c |  6 ++
 include/linux/dmar.h|  1 +
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 33fb4244e438..34a53aeede38 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -71,6 +71,12 @@ struct acpi_table_header * __initdata dmar_tbl;
 static int dmar_dev_scope_status = 1;
 static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)];
 
+static struct workqueue_struct *fault_wq;
+struct fault {
+   struct work_struct work;
+   struct intel_iommu *iommu;
+};
+
 static int alloc_iommu(struct dmar_drhd_unit *drhd);
 static void free_iommu(struct intel_iommu *iommu);
 
@@ -811,6 +817,14 @@ void __init dmar_register_bus_notifier(void)
bus_register_notifier(_bus_type, _pci_bus_nb);
 }
 
+int __init dmar_init_workqueue(void)
+{
+   fault_wq = alloc_workqueue("dmar_fault", WQ_MEM_RECLAIM | WQ_UNBOUND, 
0);
+   if (fault_wq == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
 
 int __init dmar_table_init(void)
 {
@@ -1695,6 +1709,38 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static void dmar_fault_work(struct work_struct *work)
+{
+   struct fault *fault = container_of(work, struct fault, work);
+
+   dmar_fault(-1, (void*)fault->iommu);
+}
+
+static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
+{
+   struct intel_iommu *iommu = dev_id;
+   struct fault *fault;
+
+   fault = kzalloc(sizeof(*fault), GFP_ATOMIC);
+   if (fault == NULL) {
+   unsigned long flag;
+
+   /* During OOM - clear faults and let device re-fault */
+   raw_spin_lock_irqsave(>register_lock, flag);
+   clear_primary_faults(iommu);
+   raw_spin_unlock_irqrestore(>register_lock, flag);
+
+   return IRQ_HANDLED;
+   }
+
+   fault->iommu = iommu;
+   INIT_WORK(>work, dmar_fault_work);
+
+   queue_work(fault_wq, >work);
+
+   return IRQ_HANDLED;
+}
+
 int dmar_set_interrupt(struct intel_iommu *iommu)
 {
int irq, ret;
@@ -1713,7 +1759,7 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
return -EINVAL;
}
 
-   ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
+   ret = request_irq(irq, dmar_fault_handler, IRQF_NO_THREAD, iommu->name, 
iommu);
if (ret)
pr_err("Can't request irq\n");
return ret;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01cb7d1..64ad9786f5b9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4783,6 +4783,12 @@ int __init intel_iommu_init(void)
goto out_free

[PATCHv2 2/6] iommu/intel: Clean/document fault status flags

2018-02-12 Thread Dmitry Safonov via iommu
So one could decode them without opening the specification.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 include/linux/intel-iommu.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8dad3dd26eae..ef169d67df92 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -209,12 +209,12 @@
 #define DMA_FECTL_IM (((u32)1) << 31)
 
 /* FSTS_REG */
-#define DMA_FSTS_PPF ((u32)2)
-#define DMA_FSTS_PFO ((u32)1)
-#define DMA_FSTS_IQE (1 << 4)
-#define DMA_FSTS_ICE (1 << 5)
-#define DMA_FSTS_ITE (1 << 6)
-#define DMA_FSTS_PRO (1 << 7)
+#define DMA_FSTS_PFO (1 << 0) /* Primary Fault Overflow */
+#define DMA_FSTS_PPF (1 << 1) /* Primary Pending Fault */
+#define DMA_FSTS_IQE (1 << 4) /* Invalidation Queue Error */
+#define DMA_FSTS_ICE (1 << 5) /* Invalidation Completion Error */
+#define DMA_FSTS_ITE (1 << 6) /* Invalidation Time-out Error */
+#define DMA_FSTS_PRO (1 << 7) /* Page Request Overflow */
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
 /* FRCD_REG, 32 bits access */
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv2 3/6] iommu/intel: Introduce clear_primary_faults() helper

2018-02-12 Thread Dmitry Safonov via iommu
To my mind it's a bit more readable - and I will re-use it in the next
patch.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index accf58388bdb..33fb4244e438 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1611,6 +1611,15 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
return 0;
 }
 
+static void clear_primary_faults(struct intel_iommu *iommu)
+{
+   volatile void __iomem *fsts = iommu->reg + DMAR_FSTS_REG;
+
+   assert_raw_spin_locked(>register_lock);
+
+   writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO, fsts);
+}
+
 #define PRIMARY_FAULT_REG_LEN (16)
 irqreturn_t dmar_fault(int irq, void *dev_id)
 {
@@ -1679,8 +1688,7 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
raw_spin_lock_irqsave(>register_lock, flag);
}
 
-   writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO,
-  iommu->reg + DMAR_FSTS_REG);
+   clear_primary_faults(iommu);
 
 unlock_exit:
raw_spin_unlock_irqrestore(>register_lock, flag);
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv2 1/6] iommu/intel: Add __init for dmar_register_bus_notifier()

2018-02-12 Thread Dmitry Safonov via iommu
It's called only from intel_iommu_init(), which is init function.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9a7ffd13c7f0..accf58388bdb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -806,7 +806,7 @@ int __init dmar_dev_scope_init(void)
return dmar_dev_scope_status;
 }
 
-void dmar_register_bus_notifier(void)
+void __init dmar_register_bus_notifier(void)
 {
bus_register_notifier(_bus_type, _pci_bus_nb);
 }
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq

2018-02-12 Thread Dmitry Safonov via iommu
Changes to v2:
- Ratelimit printks for dmar faults (6 patch)

First version:
https://lkml.org/lkml/2018/1/24/364

A softlockup-panic fix I've meet on kernel test suite.
While at it, fix a couple of minor issues.

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: David Woodhouse <dw...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: Lu Baolu <baolu...@linux.intel.com>
Cc: iommu@lists.linux-foundation.org

Dmitry Safonov (6):
  iommu/intel: Add __init for dmar_register_bus_notifier()
  iommu/intel: Clean/document fault status flags
  iommu/intel: Introduce clear_primary_faults() helper
  iommu/intel: Handle DMAR faults on workqueue
  iommu/intel: Rename dmar_fault() => dmar_serve_faults()
  iommu/intel: Ratelimit each dmar fault printing

 drivers/iommu/dmar.c| 73 +++--
 drivers/iommu/intel-iommu.c |  8 +++-
 drivers/iommu/intel_irq_remapping.c |  2 +-
 include/linux/dmar.h|  3 +-
 include/linux/intel-iommu.h | 12 +++---
 5 files changed, 78 insertions(+), 20 deletions(-)

-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] iommu/intel: Clean/document fault status flags

2018-01-24 Thread Dmitry Safonov via iommu
So one could decode them without opening the specification.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 include/linux/intel-iommu.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index f3274d9f46a2..a4dc9c2875cc 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -207,12 +207,12 @@
 #define DMA_FECTL_IM (((u32)1) << 31)
 
 /* FSTS_REG */
-#define DMA_FSTS_PPF ((u32)2)
-#define DMA_FSTS_PFO ((u32)1)
-#define DMA_FSTS_IQE (1 << 4)
-#define DMA_FSTS_ICE (1 << 5)
-#define DMA_FSTS_ITE (1 << 6)
-#define DMA_FSTS_PRO (1 << 7)
+#define DMA_FSTS_PFO (1 << 0) /* Primary Fault Overflow */
+#define DMA_FSTS_PPF (1 << 1) /* Primary Pending Fault */
+#define DMA_FSTS_IQE (1 << 4) /* Invalidation Queue Error */
+#define DMA_FSTS_ICE (1 << 5) /* Invalidation Completion Error */
+#define DMA_FSTS_ITE (1 << 6) /* Invalidation Time-out Error */
+#define DMA_FSTS_PRO (1 << 7) /* Page Request Overflow */
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
 /* FRCD_REG, 32 bits access */
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5] iommu/intel: Handle DMAR faults on workqueue

2018-01-24 Thread Dmitry Safonov via iommu
dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-one.
The nuisance is that it's set as a irq handler and runs with disabled
interrupts - which works OK if you have only a couple of DMAR faults,
but becomes a problem if your intel iommu has a plenty of mappings.

We have a test that checks flapping of PCI link. Once or twice it had
stuck on soft lockup (which is panic for the typical switch):

 dmar: DMAR:[DMA Write] Request device [4a:00.0] fault addr 5e1a3000
 DMAR:[fault reason 02] Present bit in context entry is clear
 NMI watchdog: BUG: soft lockup - CPU#8 stuck for 21s!
 CPU: 8 PID: 2774 Comm: SuperServer
 Call Trace:
  generic_exec_single+0x11b/0x12d
  ? flush_tlb_func+0x0/0x18a
  smp_call_function_single+0x89/0xb5
  smp_call_function_many+0xef/0x207
  native_flush_tlb_others+0x2e/0x30
  flush_tlb_mm_range+0x139/0x18a
  tlb_flush_mmu_tlbonly+0x35/0x85
  tlb_flush_mmu+0x13/0x1f
  tlb_finish_mmu+0x14/0x39
  unmap_region+0xda/0xec
  do_munmap+0x273/0x2f5
  vm_munmap+0x45/0x5e
  SyS_munmap+0x26/0x2f
  sysenter_dispatch+0x7/0x25

 Kernel panic - not syncing: softlockup: hung tasks
 sending NMI to all CPUs:
 NMI backtrace for cpu 0
 CPU: 0 PID: 0 Comm: swapper/0
 Call Trace:
  
  wait_for_xmitr+0x26/0x8f
  serial8250_console_putchar+0x1c/0x2c
  uart_console_write+0x40/0x4b
  serial8250_console_write+0xe6/0x13f
  call_console_drivers.constprop.13+0xce/0x103
  console_unlock+0x1f8/0x39b
  ? local_clock+0x21/0x23
  vprintk_emit+0x39e/0x3e6
  printk+0x4d/0x4f
  dmar_fault+0x1ab/0x1fd
  handle_irq_event_percpu+0x8a/0x1f5
  handle_irq_event+0x41/0x61
  handle_edge_irq+0xe1/0xfa
  handle_irq+0x155/0x166
  do_IRQ+0x5a/0xea
  ret_from_intr+0x0/0x15
  
  arch_cpu_idle+0xf/0x11
  cpu_startup_entry+0x22f/0x3cb
  rest_init+0x80/0x84
  start_kernel+0x470/0x47d
  x86_64_start_reservations+0x2a/0x2c
  x86_64_start_kernel+0x14b/0x15a

Move DMAR faults clearing out of irq-disabled critical section by
proceeding with that in a workqueue thread.
The next patch will correct the definition of dmar_fault().

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c| 48 -
 drivers/iommu/intel-iommu.c |  6 ++
 include/linux/dmar.h|  1 +
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 33fb4244e438..34a53aeede38 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -71,6 +71,12 @@ struct acpi_table_header * __initdata dmar_tbl;
 static int dmar_dev_scope_status = 1;
 static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)];
 
+static struct workqueue_struct *fault_wq;
+struct fault {
+   struct work_struct work;
+   struct intel_iommu *iommu;
+};
+
 static int alloc_iommu(struct dmar_drhd_unit *drhd);
 static void free_iommu(struct intel_iommu *iommu);
 
@@ -811,6 +817,14 @@ void __init dmar_register_bus_notifier(void)
bus_register_notifier(_bus_type, _pci_bus_nb);
 }
 
+int __init dmar_init_workqueue(void)
+{
+   fault_wq = alloc_workqueue("dmar_fault", WQ_MEM_RECLAIM | WQ_UNBOUND, 
0);
+   if (fault_wq == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
 
 int __init dmar_table_init(void)
 {
@@ -1695,6 +1709,38 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static void dmar_fault_work(struct work_struct *work)
+{
+   struct fault *fault = container_of(work, struct fault, work);
+
+   dmar_fault(-1, (void*)fault->iommu);
+}
+
+static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
+{
+   struct intel_iommu *iommu = dev_id;
+   struct fault *fault;
+
+   fault = kzalloc(sizeof(*fault), GFP_ATOMIC);
+   if (fault == NULL) {
+   unsigned long flag;
+
+   /* During OOM - clear faults and let device re-fault */
+   raw_spin_lock_irqsave(>register_lock, flag);
+   clear_primary_faults(iommu);
+   raw_spin_unlock_irqrestore(>register_lock, flag);
+
+   return IRQ_HANDLED;
+   }
+
+   fault->iommu = iommu;
+   INIT_WORK(>work, dmar_fault_work);
+
+   queue_work(fault_wq, >work);
+
+   return IRQ_HANDLED;
+}
+
 int dmar_set_interrupt(struct intel_iommu *iommu)
 {
int irq, ret;
@@ -1713,7 +1759,7 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
return -EINVAL;
}
 
-   ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
+   ret = request_irq(irq, dmar_fault_handler, IRQF_NO_THREAD, iommu->name, 
iommu);
if (ret)
pr_err("Can't request irq\n");
return ret;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4a2de34895ec..4869c02652cc 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4784,6 +4784,12 @@ int __init intel_iommu_init(void)
goto out_free

[PATCH 0/5] iommu/intel: Handle DMAR faults in a wq

2018-01-24 Thread Dmitry Safonov via iommu
A softlockup-panic fix I've meet on kernel test suite.
While at it, fix a couple of minor issues.

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: David Woodhouse <dw...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: Lu Baolu <baolu...@linux.intel.com>
Cc: iommu@lists.linux-foundation.org

Dmitry Safonov (5):
  iommu/intel: Add __init for dmar_register_bus_notifier()
  iommu/intel: Clean/document fault status flags
  iommu/intel: Introduce clear_primary_faults() helper
  iommu/intel: Handle DMAR faults on workqueue
  iommu/intel: Rename dmar_fault() => dmar_serve_faults()

 drivers/iommu/dmar.c| 66 +
 drivers/iommu/intel-iommu.c |  8 -
 drivers/iommu/intel_irq_remapping.c |  2 +-
 include/linux/dmar.h|  3 +-
 include/linux/intel-iommu.h | 12 +++
 5 files changed, 75 insertions(+), 16 deletions(-)

-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] iommu/intel: Introduce clear_primary_faults() helper

2018-01-24 Thread Dmitry Safonov via iommu
To my mind it's a bit more readable - and I will re-use it in the next
patch.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index accf58388bdb..33fb4244e438 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1611,6 +1611,15 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
return 0;
 }
 
+static void clear_primary_faults(struct intel_iommu *iommu)
+{
+   volatile void __iomem *fsts = iommu->reg + DMAR_FSTS_REG;
+
+   assert_raw_spin_locked(>register_lock);
+
+   writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO, fsts);
+}
+
 #define PRIMARY_FAULT_REG_LEN (16)
 irqreturn_t dmar_fault(int irq, void *dev_id)
 {
@@ -1679,8 +1688,7 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
raw_spin_lock_irqsave(>register_lock, flag);
}
 
-   writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO,
-  iommu->reg + DMAR_FSTS_REG);
+   clear_primary_faults(iommu);
 
 unlock_exit:
raw_spin_unlock_irqrestore(>register_lock, flag);
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] iommu/intel: Add __init for dmar_register_bus_notifier()

2018-01-24 Thread Dmitry Safonov via iommu
It's called only from intel_iommu_init(), which is init function.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9a7ffd13c7f0..accf58388bdb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -806,7 +806,7 @@ int __init dmar_dev_scope_init(void)
return dmar_dev_scope_status;
 }
 
-void dmar_register_bus_notifier(void)
+void __init dmar_register_bus_notifier(void)
 {
bus_register_notifier(_bus_type, _pci_bus_nb);
 }
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] iommu/intel: Rename dmar_fault() => dmar_serve_faults()

2018-01-24 Thread Dmitry Safonov via iommu
Fix the return value, parameters and a bit better naming.

Signed-off-by: Dmitry Safonov <d...@arista.com>
---
 drivers/iommu/dmar.c| 8 +++-
 drivers/iommu/intel-iommu.c | 2 +-
 drivers/iommu/intel_irq_remapping.c | 2 +-
 include/linux/dmar.h| 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 34a53aeede38..37a1147a7592 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1635,9 +1635,8 @@ static void clear_primary_faults(struct intel_iommu 
*iommu)
 }
 
 #define PRIMARY_FAULT_REG_LEN (16)
-irqreturn_t dmar_fault(int irq, void *dev_id)
+void dmar_serve_faults(struct intel_iommu *iommu)
 {
-   struct intel_iommu *iommu = dev_id;
int reg, fault_index;
u32 fault_status;
unsigned long flag;
@@ -1706,14 +1705,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 
 unlock_exit:
raw_spin_unlock_irqrestore(>register_lock, flag);
-   return IRQ_HANDLED;
 }
 
 static void dmar_fault_work(struct work_struct *work)
 {
struct fault *fault = container_of(work, struct fault, work);
 
-   dmar_fault(-1, (void*)fault->iommu);
+   dmar_serve_faults(fault->iommu);
 }
 
 static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
@@ -1786,7 +1784,7 @@ int __init enable_drhd_fault_handling(void)
/*
 * Clear any previous faults.
 */
-   dmar_fault(iommu->irq, iommu);
+   dmar_serve_faults(iommu);
fault_status = readl(iommu->reg + DMAR_FSTS_REG);
writel(fault_status, iommu->reg + DMAR_FSTS_REG);
}
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4869c02652cc..ee98cbce54bd 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3016,7 +3016,7 @@ static void intel_iommu_init_qi(struct intel_iommu *iommu)
/*
 * Clear any previous faults.
 */
-   dmar_fault(-1, iommu);
+   dmar_serve_faults(iommu);
/*
 * Disable queued invalidation if supported and already enabled
 * before OS handover.
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 66f69af2c219..0958fdc76564 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -558,7 +558,7 @@ static int intel_setup_irq_remapping(struct intel_iommu 
*iommu)
/*
 * Clear previous faults.
 */
-   dmar_fault(-1, iommu);
+   dmar_serve_faults(iommu);
dmar_disable_qi(iommu);
 
if (dmar_enable_qi(iommu)) {
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 5de4932a6ad8..6e9a5e4ac73a 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -280,7 +280,7 @@ extern void dmar_msi_mask(struct irq_data *data);
 extern void dmar_msi_read(int irq, struct msi_msg *msg);
 extern void dmar_msi_write(int irq, struct msi_msg *msg);
 extern int dmar_set_interrupt(struct intel_iommu *iommu);
-extern irqreturn_t dmar_fault(int irq, void *dev_id);
+extern void dmar_serve_faults(struct intel_iommu *iommu);
 extern int dmar_alloc_hwirq(int id, int node, void *arg);
 extern void dmar_free_hwirq(int irq);
 
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu