Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-07 Thread Sunil Kovvuri
On Mon, Mar 6, 2017 at 10:02 PM, Robin Murphy  wrote:
> On 06/03/17 12:57, Sunil Kovvuri wrote:

 We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
 Hence I have restricted calling DMA interfaces to only when IOMMU is 
 enabled.
>>>
>>> What's 0.07Mpps as a percentage of baseline? On a correctly configured
>>> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
>>> essentially just virt_to_phys() behind a function call or two, so I'd be
>>> interested to know where any non-trivial overhead might be coming from.
>>
>> It's a 5% drop and yes device is configured as coherent.
>> And the drop is due to additional function calls.
>
> OK, interesting - sounds like there's potential for some optimisation
> there as well. AFAICS the callchain goes:
>
> dma_map_single_attrs (inline)
> - ops->map_page (__swiotlb_map_page)
>   - swiotlb_map_page
> - phys_to_dma (inline)
> - dma_capable (inline)
>
> Do you happen to have a breakdown of where the time goes? If it's mostly
> just in the indirect branch our options are limited (I'm guessing
> ThunderX doesn't have a particularly fancy branch predictor, if it's not
> even got a data prefetcher), but if it's in the SWIOTLB code then
> there's certainly room for improvement (which will hopefully tie in with
> some DMA ops work I'm planning to do soon anyway).

It's the branching which is costing the performance, as you said nothing
much can be done in the common code for this. Anyway I have submitted
new patch without conditional calling of DMA APIs, will look into reducing
performance impact (if possible implement recycling) a bit later.

Thanks,
Sunil.


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-07 Thread Sunil Kovvuri
On Mon, Mar 6, 2017 at 10:02 PM, Robin Murphy  wrote:
> On 06/03/17 12:57, Sunil Kovvuri wrote:

 We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
 Hence I have restricted calling DMA interfaces to only when IOMMU is 
 enabled.
>>>
>>> What's 0.07Mpps as a percentage of baseline? On a correctly configured
>>> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
>>> essentially just virt_to_phys() behind a function call or two, so I'd be
>>> interested to know where any non-trivial overhead might be coming from.
>>
>> It's a 5% drop and yes device is configured as coherent.
>> And the drop is due to additional function calls.
>
> OK, interesting - sounds like there's potential for some optimisation
> there as well. AFAICS the callchain goes:
>
> dma_map_single_attrs (inline)
> - ops->map_page (__swiotlb_map_page)
>   - swiotlb_map_page
> - phys_to_dma (inline)
> - dma_capable (inline)
>
> Do you happen to have a breakdown of where the time goes? If it's mostly
> just in the indirect branch our options are limited (I'm guessing
> ThunderX doesn't have a particularly fancy branch predictor, if it's not
> even got a data prefetcher), but if it's in the SWIOTLB code then
> there's certainly room for improvement (which will hopefully tie in with
> some DMA ops work I'm planning to do soon anyway).

It's the branching which is costing the performance, as you said nothing
much can be done in the common code for this. Anyway I have submitted
new patch without conditional calling of DMA APIs, will look into reducing
performance impact (if possible implement recycling) a bit later.

Thanks,
Sunil.


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Robin Murphy
On 06/03/17 12:57, Sunil Kovvuri wrote:
>>>
>>> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
>>> Hence I have restricted calling DMA interfaces to only when IOMMU is 
>>> enabled.
>>
>> What's 0.07Mpps as a percentage of baseline? On a correctly configured
>> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
>> essentially just virt_to_phys() behind a function call or two, so I'd be
>> interested to know where any non-trivial overhead might be coming from.
> 
> It's a 5% drop and yes device is configured as coherent.
> And the drop is due to additional function calls.

OK, interesting - sounds like there's potential for some optimisation
there as well. AFAICS the callchain goes:

dma_map_single_attrs (inline)
- ops->map_page (__swiotlb_map_page)
  - swiotlb_map_page
- phys_to_dma (inline)
- dma_capable (inline)

Do you happen to have a breakdown of where the time goes? If it's mostly
just in the indirect branch our options are limited (I'm guessing
ThunderX doesn't have a particularly fancy branch predictor, if it's not
even got a data prefetcher), but if it's in the SWIOTLB code then
there's certainly room for improvement (which will hopefully tie in with
some DMA ops work I'm planning to do soon anyway).

Thanks,
Robin.

> 
> Thanks,
> Sunil.
> 



Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Robin Murphy
On 06/03/17 12:57, Sunil Kovvuri wrote:
>>>
>>> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
>>> Hence I have restricted calling DMA interfaces to only when IOMMU is 
>>> enabled.
>>
>> What's 0.07Mpps as a percentage of baseline? On a correctly configured
>> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
>> essentially just virt_to_phys() behind a function call or two, so I'd be
>> interested to know where any non-trivial overhead might be coming from.
> 
> It's a 5% drop and yes device is configured as coherent.
> And the drop is due to additional function calls.

OK, interesting - sounds like there's potential for some optimisation
there as well. AFAICS the callchain goes:

dma_map_single_attrs (inline)
- ops->map_page (__swiotlb_map_page)
  - swiotlb_map_page
- phys_to_dma (inline)
- dma_capable (inline)

Do you happen to have a breakdown of where the time goes? If it's mostly
just in the indirect branch our options are limited (I'm guessing
ThunderX doesn't have a particularly fancy branch predictor, if it's not
even got a data prefetcher), but if it's in the SWIOTLB code then
there's certainly room for improvement (which will hopefully tie in with
some DMA ops work I'm planning to do soon anyway).

Thanks,
Robin.

> 
> Thanks,
> Sunil.
> 



Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Sunil Kovvuri
>>
>> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
>> Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.
>
> What's 0.07Mpps as a percentage of baseline? On a correctly configured
> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
> essentially just virt_to_phys() behind a function call or two, so I'd be
> interested to know where any non-trivial overhead might be coming from.

It's a 5% drop and yes device is configured as coherent.
And the drop is due to additional function calls.

Thanks,
Sunil.


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Sunil Kovvuri
>>
>> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
>> Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.
>
> What's 0.07Mpps as a percentage of baseline? On a correctly configured
> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
> essentially just virt_to_phys() behind a function call or two, so I'd be
> interested to know where any non-trivial overhead might be coming from.

It's a 5% drop and yes device is configured as coherent.
And the drop is due to additional function calls.

Thanks,
Sunil.


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Robin Murphy
On 04/03/17 05:54, Sunil Kovvuri wrote:
> On Fri, Mar 3, 2017 at 11:26 PM, David Miller  wrote:
>> From: sunil.kovv...@gmail.com
>> Date: Fri,  3 Mar 2017 16:17:47 +0530
>>
>>> @@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const 
>>> struct pci_device_id *ent)
>>>   if (!pass1_silicon(nic->pdev))
>>>   nic->hw_tso = true;
>>>
>>> + /* Check if we are attached to IOMMU */
>>> + nic->iommu_domain = iommu_get_domain_for_dev(dev);
>>
>> This function is not universally available.
> 
> Even if CONFIG_IOMMU_API is not enabled, it will return NULL and will be okay.
> http://lxr.free-electrons.com/source/include/linux/iommu.h#L400
> 
>>
>> This looks very hackish to me anyways, how all of this stuff is supposed
>> to work is that you simply use the DMA interfaces unconditionally and
>> whatever is behind the operations takes care of everything.
>>
>> Doing it conditionally in the driver with all of this special IOMMU
>> domain et al. knowledge makes no sense to me at all.
>>
>> I don't see other drivers doing stuff like this at all, so if you're
>> going to handle this in a unique way like this you better write
>> several paragraphs in your commit message explaining why this weird
>> crap is necessary.
> 
> I already tried to explain in the commit message that HW anyway takes care
> of data coherency, so calling DMA interfaces when there is no IOMMU will
> only result in performance drop.
> 
> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
> Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.

What's 0.07Mpps as a percentage of baseline? On a correctly configured
coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
essentially just virt_to_phys() behind a function call or two, so I'd be
interested to know where any non-trivial overhead might be coming from.

Robin.

> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 



Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Robin Murphy
On 04/03/17 05:54, Sunil Kovvuri wrote:
> On Fri, Mar 3, 2017 at 11:26 PM, David Miller  wrote:
>> From: sunil.kovv...@gmail.com
>> Date: Fri,  3 Mar 2017 16:17:47 +0530
>>
>>> @@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const 
>>> struct pci_device_id *ent)
>>>   if (!pass1_silicon(nic->pdev))
>>>   nic->hw_tso = true;
>>>
>>> + /* Check if we are attached to IOMMU */
>>> + nic->iommu_domain = iommu_get_domain_for_dev(dev);
>>
>> This function is not universally available.
> 
> Even if CONFIG_IOMMU_API is not enabled, it will return NULL and will be okay.
> http://lxr.free-electrons.com/source/include/linux/iommu.h#L400
> 
>>
>> This looks very hackish to me anyways, how all of this stuff is supposed
>> to work is that you simply use the DMA interfaces unconditionally and
>> whatever is behind the operations takes care of everything.
>>
>> Doing it conditionally in the driver with all of this special IOMMU
>> domain et al. knowledge makes no sense to me at all.
>>
>> I don't see other drivers doing stuff like this at all, so if you're
>> going to handle this in a unique way like this you better write
>> several paragraphs in your commit message explaining why this weird
>> crap is necessary.
> 
> I already tried to explain in the commit message that HW anyway takes care
> of data coherency, so calling DMA interfaces when there is no IOMMU will
> only result in performance drop.
> 
> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
> Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.

What's 0.07Mpps as a percentage of baseline? On a correctly configured
coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
essentially just virt_to_phys() behind a function call or two, so I'd be
interested to know where any non-trivial overhead might be coming from.

Robin.

> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 



Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-03 Thread Sunil Kovvuri
On Fri, Mar 3, 2017 at 11:26 PM, David Miller  wrote:
> From: sunil.kovv...@gmail.com
> Date: Fri,  3 Mar 2017 16:17:47 +0530
>
>> @@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const 
>> struct pci_device_id *ent)
>>   if (!pass1_silicon(nic->pdev))
>>   nic->hw_tso = true;
>>
>> + /* Check if we are attached to IOMMU */
>> + nic->iommu_domain = iommu_get_domain_for_dev(dev);
>
> This function is not universally available.

Even if CONFIG_IOMMU_API is not enabled, it will return NULL and will be okay.
http://lxr.free-electrons.com/source/include/linux/iommu.h#L400

>
> This looks very hackish to me anyways, how all of this stuff is supposed
> to work is that you simply use the DMA interfaces unconditionally and
> whatever is behind the operations takes care of everything.
>
> Doing it conditionally in the driver with all of this special IOMMU
> domain et al. knowledge makes no sense to me at all.
>
> I don't see other drivers doing stuff like this at all, so if you're
> going to handle this in a unique way like this you better write
> several paragraphs in your commit message explaining why this weird
> crap is necessary.

I already tried to explain in the commit message that HW anyway takes care
of data coherency, so calling DMA interfaces when there is no IOMMU will
only result in performance drop.

We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-03 Thread Sunil Kovvuri
On Fri, Mar 3, 2017 at 11:26 PM, David Miller  wrote:
> From: sunil.kovv...@gmail.com
> Date: Fri,  3 Mar 2017 16:17:47 +0530
>
>> @@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const 
>> struct pci_device_id *ent)
>>   if (!pass1_silicon(nic->pdev))
>>   nic->hw_tso = true;
>>
>> + /* Check if we are attached to IOMMU */
>> + nic->iommu_domain = iommu_get_domain_for_dev(dev);
>
> This function is not universally available.

Even if CONFIG_IOMMU_API is not enabled, it will return NULL and will be okay.
http://lxr.free-electrons.com/source/include/linux/iommu.h#L400

>
> This looks very hackish to me anyways, how all of this stuff is supposed
> to work is that you simply use the DMA interfaces unconditionally and
> whatever is behind the operations takes care of everything.
>
> Doing it conditionally in the driver with all of this special IOMMU
> domain et al. knowledge makes no sense to me at all.
>
> I don't see other drivers doing stuff like this at all, so if you're
> going to handle this in a unique way like this you better write
> several paragraphs in your commit message explaining why this weird
> crap is necessary.

I already tried to explain in the commit message that HW anyway takes care
of data coherency, so calling DMA interfaces when there is no IOMMU will
only result in performance drop.

We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-03 Thread David Miller
From: sunil.kovv...@gmail.com
Date: Fri,  3 Mar 2017 16:17:47 +0530

> @@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>   if (!pass1_silicon(nic->pdev))
>   nic->hw_tso = true;
>  
> + /* Check if we are attached to IOMMU */
> + nic->iommu_domain = iommu_get_domain_for_dev(dev);

This function is not universally available.

This looks very hackish to me anyways, how all of this stuff is supposed
to work is that you simply use the DMA interfaces unconditionally and
whatever is behind the operations takes care of everything.

Doing it conditionally in the driver with all of this special IOMMU
domain et al. knowledge makes no sense to me at all.

I don't see other drivers doing stuff like this at all, so if you're
going to handle this in a unique way like this you better write
several paragraphs in your commit message explaining why this weird
crap is necessary.

There is no way I can apply this series as it is current written.

Thanks.


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-03 Thread David Miller
From: sunil.kovv...@gmail.com
Date: Fri,  3 Mar 2017 16:17:47 +0530

> @@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>   if (!pass1_silicon(nic->pdev))
>   nic->hw_tso = true;
>  
> + /* Check if we are attached to IOMMU */
> + nic->iommu_domain = iommu_get_domain_for_dev(dev);

This function is not universally available.

This looks very hackish to me anyways, how all of this stuff is supposed
to work is that you simply use the DMA interfaces unconditionally and
whatever is behind the operations takes care of everything.

Doing it conditionally in the driver with all of this special IOMMU
domain et al. knowledge makes no sense to me at all.

I don't see other drivers doing stuff like this at all, so if you're
going to handle this in a unique way like this you better write
several paragraphs in your commit message explaining why this weird
crap is necessary.

There is no way I can apply this series as it is current written.

Thanks.


[PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-03 Thread sunil . kovvuri
From: Sunil Goutham 

ACPI support has been added to ARM IOMMU driver in 4.10
kernel and that has resulted in VNIC interfaces throwing
translation faults when kernel is booted with ACPI as
driver was not using DMA API.

On T88 HW takes care of data coherency when performing
DMA operations hence in non-iommu case using DMA API
simply wastes CPU cycles. This patch fixes translation
faults issue by doing a buffer dma_map/dma_unmap when
the corresponding PCI device is attached to a IOMMU i.e
when iommu_domain is set.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nic.h  |   1 +
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |  12 +-
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 187 ++---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |   2 +
 4 files changed, 174 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h 
b/drivers/net/ethernet/cavium/thunder/nic.h
index e739c71..2269ff5 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -269,6 +269,7 @@ struct nicvf {
 #defineMAX_QUEUES_PER_QSET 8
struct queue_set*qs;
struct nicvf_cq_poll*napi[8];
+   void*iommu_domain;
u8  vf_id;
u8  sqs_id;
boolsqs_mode;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 6feaa24..8d60c3b 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "nic_reg.h"
 #include "nic.h"
@@ -525,7 +526,12 @@ static void nicvf_snd_pkt_handler(struct net_device 
*netdev,
/* Get actual TSO descriptors and free them */
tso_sqe =
 (struct sq_hdr_subdesc *)GET_SQ_DESC(sq, hdr->rsvd2);
+   nicvf_unmap_sndq_buffers(nic, sq, hdr->rsvd2,
+tso_sqe->subdesc_cnt);
nicvf_put_sq_desc(sq, tso_sqe->subdesc_cnt + 1);
+   } else {
+   nicvf_unmap_sndq_buffers(nic, sq, cqe_tx->sqe_ptr,
+hdr->subdesc_cnt);
}
nicvf_put_sq_desc(sq, hdr->subdesc_cnt + 1);
prefetch(skb);
@@ -576,6 +582,7 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
 {
struct sk_buff *skb;
struct nicvf *nic = netdev_priv(netdev);
+   struct nicvf *snic = nic;
int err = 0;
int rq_idx;
 
@@ -592,7 +599,7 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
if (err && !cqe_rx->rb_cnt)
return;
 
-   skb = nicvf_get_rcv_skb(nic, cqe_rx);
+   skb = nicvf_get_rcv_skb(snic, cqe_rx);
if (!skb) {
netdev_dbg(nic->netdev, "Packet not received\n");
return;
@@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (!pass1_silicon(nic->pdev))
nic->hw_tso = true;
 
+   /* Check if we are attached to IOMMU */
+   nic->iommu_domain = iommu_get_domain_for_dev(dev);
+
pci_read_config_word(nic->pdev, PCI_SUBSYSTEM_ID, );
if (sdevid == 0xA134)
nic->t88 = true;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index ac0390b..ee3aa16 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -18,6 +19,49 @@
 #include "q_struct.h"
 #include "nicvf_queues.h"
 
+#define NICVF_PAGE_ORDER ((PAGE_SIZE <= 4096) ?  PAGE_ALLOC_COSTLY_ORDER : 0)
+
+static inline u64 nicvf_iova_to_phys(struct nicvf *nic, dma_addr_t dma_addr)
+{
+   /* Tranlation is installed only when IOMMU is present */
+   if (nic->iommu_domain)
+   return iommu_iova_to_phys(nic->iommu_domain, dma_addr);
+   return dma_addr;
+}
+
+static inline u64 nicvf_dma_map(struct nicvf *nic, struct page *page,
+   int offset, int len, int dir)
+{
+   /* Since HW ensures data coherency, calling DMA apis when there
+* is no IOMMU would only result in wasting CPU cycles.
+*/
+   if (!nic->iommu_domain)
+   return virt_to_phys(page_address(page) + offset);
+
+   /* CPU sync not required */
+   return (u64)dma_map_page_attrs(>pdev->dev, page,
+  offset, len, dir,
+  DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static inline void 

[PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-03 Thread sunil . kovvuri
From: Sunil Goutham 

ACPI support has been added to ARM IOMMU driver in 4.10
kernel and that has resulted in VNIC interfaces throwing
translation faults when kernel is booted with ACPI as
driver was not using DMA API.

On T88 HW takes care of data coherency when performing
DMA operations hence in non-iommu case using DMA API
simply wastes CPU cycles. This patch fixes translation
faults issue by doing a buffer dma_map/dma_unmap when
the corresponding PCI device is attached to a IOMMU i.e
when iommu_domain is set.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nic.h  |   1 +
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |  12 +-
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 187 ++---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |   2 +
 4 files changed, 174 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h 
b/drivers/net/ethernet/cavium/thunder/nic.h
index e739c71..2269ff5 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -269,6 +269,7 @@ struct nicvf {
 #defineMAX_QUEUES_PER_QSET 8
struct queue_set*qs;
struct nicvf_cq_poll*napi[8];
+   void*iommu_domain;
u8  vf_id;
u8  sqs_id;
boolsqs_mode;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 6feaa24..8d60c3b 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "nic_reg.h"
 #include "nic.h"
@@ -525,7 +526,12 @@ static void nicvf_snd_pkt_handler(struct net_device 
*netdev,
/* Get actual TSO descriptors and free them */
tso_sqe =
 (struct sq_hdr_subdesc *)GET_SQ_DESC(sq, hdr->rsvd2);
+   nicvf_unmap_sndq_buffers(nic, sq, hdr->rsvd2,
+tso_sqe->subdesc_cnt);
nicvf_put_sq_desc(sq, tso_sqe->subdesc_cnt + 1);
+   } else {
+   nicvf_unmap_sndq_buffers(nic, sq, cqe_tx->sqe_ptr,
+hdr->subdesc_cnt);
}
nicvf_put_sq_desc(sq, hdr->subdesc_cnt + 1);
prefetch(skb);
@@ -576,6 +582,7 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
 {
struct sk_buff *skb;
struct nicvf *nic = netdev_priv(netdev);
+   struct nicvf *snic = nic;
int err = 0;
int rq_idx;
 
@@ -592,7 +599,7 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
if (err && !cqe_rx->rb_cnt)
return;
 
-   skb = nicvf_get_rcv_skb(nic, cqe_rx);
+   skb = nicvf_get_rcv_skb(snic, cqe_rx);
if (!skb) {
netdev_dbg(nic->netdev, "Packet not received\n");
return;
@@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (!pass1_silicon(nic->pdev))
nic->hw_tso = true;
 
+   /* Check if we are attached to IOMMU */
+   nic->iommu_domain = iommu_get_domain_for_dev(dev);
+
pci_read_config_word(nic->pdev, PCI_SUBSYSTEM_ID, );
if (sdevid == 0xA134)
nic->t88 = true;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index ac0390b..ee3aa16 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -18,6 +19,49 @@
 #include "q_struct.h"
 #include "nicvf_queues.h"
 
+#define NICVF_PAGE_ORDER ((PAGE_SIZE <= 4096) ?  PAGE_ALLOC_COSTLY_ORDER : 0)
+
+static inline u64 nicvf_iova_to_phys(struct nicvf *nic, dma_addr_t dma_addr)
+{
+   /* Tranlation is installed only when IOMMU is present */
+   if (nic->iommu_domain)
+   return iommu_iova_to_phys(nic->iommu_domain, dma_addr);
+   return dma_addr;
+}
+
+static inline u64 nicvf_dma_map(struct nicvf *nic, struct page *page,
+   int offset, int len, int dir)
+{
+   /* Since HW ensures data coherency, calling DMA apis when there
+* is no IOMMU would only result in wasting CPU cycles.
+*/
+   if (!nic->iommu_domain)
+   return virt_to_phys(page_address(page) + offset);
+
+   /* CPU sync not required */
+   return (u64)dma_map_page_attrs(>pdev->dev, page,
+  offset, len, dir,
+  DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static inline void nicvf_dma_unmap(struct nicvf *nic, dma_addr_t