From: Sven Van Asbroeck <thesve...@gmail.com>

The buffers in the lan743x driver's receive ring are always 9K,
even when the largest packet that can be received (the mtu) is
much smaller. This performs particularly badly on cpu archs
without dma cache snooping (such as ARM): each received packet
results in a 9K dma_{map|unmap} operation, which is very expensive
because cpu caches need to be invalidated.

Careful measurement of the driver rx path on armv7 reveals that
the cpu spends the majority of its time waiting for cache
invalidation.

Optimize as follows:

1. set rx ring buffer size equal to the mtu. this limits the
   amount of cache that needs to be invalidated per dma_map().

2. when dma_unmap()ping, skip cpu sync. Sync only the packet data
   actually received, the size of which the chip will indicate in
   its rx ring descriptors. this limits the amount of cache that
   needs to be invalidated per dma_unmap().

These optimizations double the rx performance on armv7.
Third parties report 3x rx speedup on armv8.

Performance on dma cache snooping architectures (such as x86)
is expected to stay the same.

Tested with iperf3 on a freescale imx6qp + lan7430, both sides
set to mtu 1500 bytes, measure rx performance:

Before:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec   550 MBytes   231 Mbits/sec    0
After:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec  1.33 GBytes   570 Mbits/sec    0

Test by Anders Roenningen (and...@ronningen.priv.no) on armv8,
    rx iperf3:
Before 102 Mbits/sec
After  279 Mbits/sec

Signed-off-by: Sven Van Asbroeck <thesve...@gmail.com>
---

Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git # 
46eb3c108fe1

To: Bryan Whitehead <bryan.whiteh...@microchip.com>
To: unglinuxdri...@microchip.com
To: "David S. Miller" <da...@davemloft.net>
To: Jakub Kicinski <k...@kernel.org>
Cc: Andrew Lunn <and...@lunn.ch>
Cc: Alexey Denisov <rtg...@gmail.com>
Cc: Sergej Bauer <sba...@blackbox.su>
Cc: Tim Harvey <thar...@gateworks.com>
Cc: Anders Rønningen <and...@ronningen.priv.no>
Cc: net...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)

 drivers/net/ethernet/microchip/lan743x_main.c | 35 ++++++++++++-------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index f1f6eba4ace4..f485320e5784 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1957,11 +1957,11 @@ static int lan743x_rx_next_index(struct lan743x_rx *rx, 
int index)
 
 static struct sk_buff *lan743x_rx_allocate_skb(struct lan743x_rx *rx)
 {
-       int length = 0;
+       struct net_device *netdev = rx->adapter->netdev;
 
-       length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
-       return __netdev_alloc_skb(rx->adapter->netdev,
-                                 length, GFP_ATOMIC | GFP_DMA);
+       return __netdev_alloc_skb(netdev,
+                                 netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING,
+                                 GFP_ATOMIC | GFP_DMA);
 }
 
 static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
@@ -1977,9 +1977,10 @@ static int lan743x_rx_init_ring_element(struct 
lan743x_rx *rx, int index,
 {
        struct lan743x_rx_buffer_info *buffer_info;
        struct lan743x_rx_descriptor *descriptor;
-       int length = 0;
+       struct net_device *netdev = rx->adapter->netdev;
+       int length;
 
-       length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
+       length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
        descriptor = &rx->ring_cpu_ptr[index];
        buffer_info = &rx->buffer_info[index];
        buffer_info->skb = skb;
@@ -2148,11 +2149,18 @@ static int lan743x_rx_process_packet(struct lan743x_rx 
*rx)
                        descriptor = &rx->ring_cpu_ptr[first_index];
 
                        /* unmap from dma */
+                       packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
+                                       (descriptor->data0);
                        if (buffer_info->dma_ptr) {
-                               dma_unmap_single(&rx->adapter->pdev->dev,
-                                                buffer_info->dma_ptr,
-                                                buffer_info->buffer_length,
-                                                DMA_FROM_DEVICE);
+                               dma_sync_single_for_cpu(&rx->adapter->pdev->dev,
+                                                       buffer_info->dma_ptr,
+                                                       packet_length,
+                                                       DMA_FROM_DEVICE);
+                               dma_unmap_single_attrs(&rx->adapter->pdev->dev,
+                                                      buffer_info->dma_ptr,
+                                                      
buffer_info->buffer_length,
+                                                      DMA_FROM_DEVICE,
+                                                      DMA_ATTR_SKIP_CPU_SYNC);
                                buffer_info->dma_ptr = 0;
                                buffer_info->buffer_length = 0;
                        }
@@ -2167,8 +2175,8 @@ static int lan743x_rx_process_packet(struct lan743x_rx 
*rx)
                        int index = first_index;
 
                        /* multi buffer packet not supported */
-                       /* this should not happen since
-                        * buffers are allocated to be at least jumbo size
+                       /* this should not happen since buffers are allocated
+                        * to be at least the mtu size configured in the mac.
                         */
 
                        /* clean up buffers */
@@ -2628,6 +2636,9 @@ static int lan743x_netdev_change_mtu(struct net_device 
*netdev, int new_mtu)
        struct lan743x_adapter *adapter = netdev_priv(netdev);
        int ret = 0;
 
+       if (netif_running(netdev))
+               return -EBUSY;
+
        ret = lan743x_mac_set_mtu(adapter, new_mtu);
        if (!ret)
                netdev->mtu = new_mtu;
-- 
2.17.1

Reply via email to