On Sat, Aug 21, 2021 at 06:44:39PM +0300, Lennert Buytenhek wrote:

> > > - EVENT_FLAG_I unset, but the request was a translation request
> > >   (EVENT_FLAG_TR set) or the target page was not present
> > >   (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW
> > >   bit will be invalid, so don't try to map it to a
> > >   IOMMU_FAULT_{READ,WRITE} code
> > 
> > So, why do we need to call report_iommu_fault() for this case?
> > My understanding is we only have IOMMU_FAULT_[READ|WRITE].
> > So, if we can't identify whether the DMA is read / write,
> > we should not need to call report_iommu_fauilt(), is it?
> 
> I don't think that we should just altogether avoid logging the subset
> of page faults for which we can't determine the read/write direction
> on AMD platforms.
> 
> E.g. "access to an unmapped address" (which will have PR=0, and thus we
> won't know if it was a read or a write access) is just as much of a page
> fault as "write to a read-only page" (which will have PR=1, and thus the
> RW bit will be accurate) is, and for RAS purposes, both events are
> equally interesting, and important to know about.
> 
> It's true that we currently don't have a way of signaling to
> report_iommu_fault() (and by extension, to the io_page_fault
> tracepoint) that we're not sure whether the offending access was a read
> or a write, but I think we can just add a bit to include/linux/iommu.h
> to indicate that, something along the lines of:
> 
>        /* iommu fault flags */
>        #define IOMMU_FAULT_READ        0x0
>        #define IOMMU_FAULT_WRITE       0x1
>       +#define IOMMU_FAULT_RW_UNKNOWN  0x2
> 
> (Cc'ing Ohad Ben-Cohen, who originally added this API.)
> 
> I don't think that it would be a good idea to just not signal the page
> faults for which we don't know the read/write direction.

I had another look at this, and from some testing, it seems that,
contrary to what the datasheet suggests, the RW (read/write direction)
bit in logged I/O page faults is actually accurate for both faulting
accesses to present pages and faulting accesses to non-present pages.

I made a few hacks to the ixgbe driver to intentionally cause several
different types of I/O page faults, thereby turning an ixgbe NIC into
a poor man's I/O page fault generator, and these are the resulting
logged I/O page faults, on a Ryzen 3700X system:

Read from non-present page:
ixgbe 0000:25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0x1bbc8f040 flags=0x0000]
=> flags indicate PR(esent)=0, RW=0

Write to non-present page:
ixgbe 0000:25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0x1bdcb70c0 flags=0x0020]
=> flags indicate PR(esent)=0, RW=1

Read from write-only page:
ixgbe 0000:25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0xbbcc1c40 flags=0x0050]
=> flags indicate PR(esent)=1, RW=0  (and PE(rmission violation)=1)

Write to read-only page:
ixgbe 0000:25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0xbdcb70c0 flags=0x0070]
=> flags indicate PR(esent)=1, RW=1  (and PE(rmission violation)=1)

In other words, it seems that the RW bit is reliable even for PR=0 type
faults.

I assume that the restriction mentioned in the docs regarding RW and
PR ("RW is only meaningful when PR=1, TR=0, and I=0" from AMD I/O
Virtualization Technology (IOMMU) Specification, revision 3.00, Table
55: IO_PAGE_FAULT Event Log Buffer Entry Fields) is either confused
or refers to a restriction in older hardware that has since been lifted.

I'll resubmit the patch to unconditionally pass through the RW bit.




ixgbe hacks:

to cause reads from non-present pages (in the TX path):

--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8232,7 +8232,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
                dma_unmap_len_set(tx_buffer, len, size);
                dma_unmap_addr_set(tx_buffer, dma, dma);
 
-               tx_desc->read.buffer_addr = cpu_to_le64(dma);
+               tx_desc->read.buffer_addr = cpu_to_le64(dma + BIT(32));
 
                while (unlikely(size > IXGBE_MAX_DATA_PER_TXD)) {
                        tx_desc->read.cmd_type_len =


to cause writes to non-present pages (in the RX path):

--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1604,7 +1604,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, 
u16 cleaned_count)
                 * Refresh the desc even if buffer_addrs didn't change
                 * because each write-back erases this info.
                 */
-               rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
+               rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset 
+ BIT(32));
 
                rx_desc++;
                bi++;


to cause reads from write-only pages (in the TX path):

--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8220,7 +8220,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
        }
 
 #endif
-       dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_TO_DEVICE);
+       dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_FROM_DEVICE);
 
        tx_buffer = first;
 

to cause writes to read-only pages (in the RX path):

--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1545,7 +1545,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring 
*rx_ring,
        /* map page for use */
        dma = dma_map_page_attrs(rx_ring->dev, page, 0,
                                 ixgbe_rx_pg_size(rx_ring),
-                                DMA_FROM_DEVICE,
+                                DMA_TO_DEVICE,
                                 IXGBE_RX_DMA_ATTR);
 
        /*
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to