Public bug reported: [Impact] With SMMU page translation mode enabled. Several inefficiencies on ARM SMMUv3 driver has been identified. As a result IO BW is severely impacted.
[Test] Running iperf on mellanox Connect-x4 cards capable of doing 40Gbps BW connected back to back we get only about 15Gbps. iperf -c 192.168.5.5 --bind 192.168.5.10 -P 5 -w 3.1M ------------------------------------------------------------ Client connecting to 192.168.5.5, TCP port 5001 Binding to local address 192.168.5.10 TCP window size: 6.20 MByte (WARNING: requested 3.10 MByte) ------------------------------------------------------------ [ 7] local 192.168.5.10 port 42588 connected with 192.168.5.5 port 5001 [ 3] local 192.168.5.10 port 42582 connected with 192.168.5.5 port 5001 [ 5] local 192.168.5.10 port 42584 connected with 192.168.5.5 port 5001 [ 4] local 192.168.5.10 port 42586 connected with 192.168.5.5 port 5001 [ 6] local 192.168.5.10 port 42590 connected with 192.168.5.5 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 6.50 GBytes 5.58 Gbits/sec [ 4] 0.0-10.0 sec 1.81 GBytes 1.56 Gbits/sec [ 6] 0.0-10.0 sec 6.08 GBytes 5.22 Gbits/sec [ 7] 0.0-10.0 sec 1.99 GBytes 1.71 Gbits/sec [ 5] 0.0-10.0 sec 2.00 GBytes 1.72 Gbits/sec [SUM] 0.0-10.0 sec 18.4 GBytes 15.8 Gbits/sec [Fix] After applying the patches listed below from linux-next, we were able to get throughputs of 35+ Gbps. These patches are currently in linux-next and in line for 4.13-rc. iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table iommu/arm-smmu-v3: Remove io-pgtable spinlock iommu/arm-smmu: Remove io-pgtable spinlock iommu/io-pgtable-arm-v7s: Support lockless operation iommu/io-pgtable-arm: Support lockless operation iommu/io-pgtable: Introduce explicit coherency iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap iommu/io-pgtable-arm: Improve split_blk_unmap iommu/io-pgtable-arm-v7s: Check table PTEs more precisely iommu/io-pgtable-arm-v7s: constify dummy_tlb_ops. iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it iommu/io-pgtable-arm-v7s: Add support for the IOMMU_PRIV flag iommu/io-pgtable-arm: Avoid shift overflow in block size iommu/io-pgtable-arm: Check for leaf entry before dereferencing it iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag iommu: add IOMMU_PRIV attribute [Regression Potential] These patches cherry-pick cleanly ontop of Zesty (4.10), and the patches are limited to ARM SMMU and pagetable fixes. The patches were tested on QDF2400 system with mlx connect-X4 40G cards. ** Affects: linux (Ubuntu) Importance: Critical Assignee: Manoj Iyer (manjo) Status: Incomplete ** Tags: qdf2400 zesty ** Summary changed: - [Zesty] Fixes to smmuv3 on ARM64 + [Zesty] Fixes to smmuv3 and io-pgtable on ARM64 ** Summary changed: - [Zesty] Fixes to smmuv3 and io-pgtable on ARM64 + [Zesty] Fixes to iommu on arm64 to improve IO performance -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1705123 Title: [Zesty] Fixes to iommu on arm64 to improve IO performance Status in linux package in Ubuntu: Incomplete Bug description: [Impact] With SMMU page translation mode enabled. Several inefficiencies on ARM SMMUv3 driver has been identified. As a result IO BW is severely impacted. [Test] Running iperf on mellanox Connect-x4 cards capable of doing 40Gbps BW connected back to back we get only about 15Gbps. iperf -c 192.168.5.5 --bind 192.168.5.10 -P 5 -w 3.1M ------------------------------------------------------------ Client connecting to 192.168.5.5, TCP port 5001 Binding to local address 192.168.5.10 TCP window size: 6.20 MByte (WARNING: requested 3.10 MByte) ------------------------------------------------------------ [ 7] local 192.168.5.10 port 42588 connected with 192.168.5.5 port 5001 [ 3] local 192.168.5.10 port 42582 connected with 192.168.5.5 port 5001 [ 5] local 192.168.5.10 port 42584 connected with 192.168.5.5 port 5001 [ 4] local 192.168.5.10 port 42586 connected with 192.168.5.5 port 5001 [ 6] local 192.168.5.10 port 42590 connected with 192.168.5.5 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 6.50 GBytes 5.58 Gbits/sec [ 4] 0.0-10.0 sec 1.81 GBytes 1.56 Gbits/sec [ 6] 0.0-10.0 sec 6.08 GBytes 5.22 Gbits/sec [ 7] 0.0-10.0 sec 1.99 GBytes 1.71 Gbits/sec [ 5] 0.0-10.0 sec 2.00 GBytes 1.72 Gbits/sec [SUM] 0.0-10.0 sec 18.4 GBytes 15.8 Gbits/sec [Fix] After applying the patches listed below from linux-next, we were able to get throughputs of 35+ Gbps. These patches are currently in linux-next and in line for 4.13-rc. iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table iommu/arm-smmu-v3: Remove io-pgtable spinlock iommu/arm-smmu: Remove io-pgtable spinlock iommu/io-pgtable-arm-v7s: Support lockless operation iommu/io-pgtable-arm: Support lockless operation iommu/io-pgtable: Introduce explicit coherency iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap iommu/io-pgtable-arm: Improve split_blk_unmap iommu/io-pgtable-arm-v7s: Check table PTEs more precisely iommu/io-pgtable-arm-v7s: constify dummy_tlb_ops. iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it iommu/io-pgtable-arm-v7s: Add support for the IOMMU_PRIV flag iommu/io-pgtable-arm: Avoid shift overflow in block size iommu/io-pgtable-arm: Check for leaf entry before dereferencing it iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag iommu: add IOMMU_PRIV attribute [Regression Potential] These patches cherry-pick cleanly ontop of Zesty (4.10), and the patches are limited to ARM SMMU and pagetable fixes. The patches were tested on QDF2400 system with mlx connect-X4 40G cards. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705123/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp