Re: [PATCH RFC 0/4] xfs: Transmit flow steering
As Rick states, this fixes a performance issue with the 4.4 kernel for us. Tested-by: Juerg Haefliger On 09/28/2016 05:13 PM, Rick Jones wrote: > > Here is a quick look at performance tests for the result of trying the > prototype fix for the packet reordering problem with VMs sending over > an XPS-configured NIC. In particular, the Emulex/Avago/Broadcom > Skyhawk. The fix was applied to a 4.4 kernel. > > Before: 3884 Mbit/s > After: 8897 Mbit/s > > That was from a VM on a node with a Skyhawk and 2 E5-2640 processors > to baremetal E5-2640 with a BE3. Physical MTU was 1500, the VM's > vNIC's MTU was 1400. Systems were HPE ProLiants in OS Control Mode > for power management, with the "performance" frequency governor > loaded. An OpenStack Mitaka setup with Distributed Virtual Router. > > We had some other NIC types in the setup as well. XPS was also > enabled on the ConnectX3-Pro. It was not enabled on the 82599ES (a > function of the kernel being used, which had it disabled from the > first reports of XPS negatively affecting VM traffic at the beginning > of the year) > > Average Mbit/s From NIC type To Bare Metal BE3: > NIC Type, > CPU on VM HostBeforeAfter > > ConnectX-3 Pro,E5-2670v39224 9271 > BE3, E5-26409016 9022 > 82599, E5-2640 9192 9003 > BCM57840, E5-2640 9213 9153 > Skyhawk, E5-26403884 8897 > > For completeness: > Average Mbit/s To NIC type from Bare Metal BE3: > NIC Type, > CPU on VM HostBeforeAfter > > ConnectX-3 Pro,E5-2670v39322 9144 > BE3, E5-26409074 9017 > 82599, E5-2640 8670 8564 > BCM57840, E5-2640 2468 * 7979 > Skyhawk, E5-26408897 9269 > > * This is the Busted bnx2x NIC FW GRO implementation issue. It was > not visible in the "After" because the system was setup to disable > the NIC FW GRO by the time it booted on the fix kernel. > > Average Transactions/s Between NIC type and Bare Metal BE3: > NIC Type, > CPU on VM HostBeforeAfter > > ConnectX-3 Pro,E5-2670v3 12421 12612 > BE3, E5-26408178 8484 > 82599, E5-2640 8499 8549 > BCM57840, E5-2640 8544 8560 > Skyhawk, E5-26408537 8701 > > happy benchmarking, > > Drew Balliet > Jeurg Haefliger > rick jones > > The semi-cooked results with additional statistics: > > 554M - BE3 > 544+M - ConnectX-3 Pro > 560M - 82599ES > 630M - BCM57840 > 650M - Skyhawk > > (substitute is simply replacing a system name with the model of NIC and CPU) > Bulk To (South) and From (North) VM, Before: > $ ../substitute.sh > vxlan_554m_control_performance_gvnr_dvr_northsouth_stream.log | > ~/netperf2_trunk/doc/examples/parse_single_stream.py -r -5 -f 1 -f 3 -f 4 -f > 7 -f 8 > Field1,Field3,Field4,Field7,Field8,Min,P10,Median,Average,P90,P99,Max,Count > North,560M,E5-2640,554FLB,E5-2640,8148.090,9048.830,9235.400,9192.868,9315.980,9338.845,9339.500,113 > North,630M,E5-2640,554FLB,E5-2640,8909.980,9113.238,9234.750,9213.140,9299.442,9336.206,9337.830,47 > North,544+M,E5-2670v3,554FLB,E5-2640,9013.740,9182.546,9229.620,9224.025,9264.036,9299.206,9301.970,99 > North,650M,E5-2640,554FLB,E5-2640,3187.680,3393.724,3796.160,3884.765,4405.096,4941.391,4956.300,129 > North,554M,E5-2640,554FLB,E5-2640,8700.930,8855.768,9026.030,9016.061,9158.846,9213.687,9226.150,135 > South,554FLB,E5-2640,560M,E5-2640,7754.350,8193.114,8718.540,8670.612,9026.436,9262.355,9285.010,113 > South,554FLB,E5-2640,630M,E5-2640,1897.660,2068.290,2514.430,2468.323,2787.162,2942.934,2957.250,53 > South,554FLB,E5-2640,544+M,E5-2670v3,9298.260,9314.432,9323.220,9322.207,9328.324,9330.704,9331.080,100 > South,554FLB,E5-2640,650M,E5-2640,8407.050,8907.136,9304.390,9206.776,9321.320,9325.347,9326.410,103 > South,554FLB,E5-2640,554M,E5-2640,7844.900,8632.530,9199.385,9074.535,9308.070,9319.224,9322.360,132 > 0 too-short lines ignored. > > Bulk To (South) and From (North) VM, After: > > $ ../substitute.sh > vxlan_554m_control_performance_gvnr_xpsfix_dvr_northsouth_stream.log | > ~/netperf2_trunk/doc/examples/parse_single_stream.py -r -5 -f 1 -f 3 -f 4 -f > 7 -f 8 > Field1,Field3,Field4,Field7,Field8,Min,P10,Median,Average,P90,P99,Max,Count > North,560M,E5-2640,554FLB,E5-2640,7576.790,8213.890,9182.870,9003.190,9295.975,9315.878,9318.160,36 > North,630M,E5-2640,554FLB,E5-2640,8811.800,8924.000,9206.660,9153.076,9306.287,9315.152,9315.790,12 > North,544+M,E5-2670v3,554FLB,E5-2640,9135.990,9228.520,9277.465,9271.875,9324.545,9339.604,9339.780,46 > North,650M,E5-2640,554FLB,E5-2640,8133.420,8483.340,8995.040,8897.779,9129.056,9165.230,9165.860,43 > North,554M,E5-2640,554FLB,E5-2640,8438.390,8879.150,9048.590,9022.813,9181.540,9248.650,9297.660,101
Re: [PATCH RFC 0/4] xfs: Transmit flow steering
Here is a quick look at performance tests for the result of trying the prototype fix for the packet reordering problem with VMs sending over an XPS-configured NIC. In particular, the Emulex/Avago/Broadcom Skyhawk. The fix was applied to a 4.4 kernel. Before: 3884 Mbit/s After: 8897 Mbit/s That was from a VM on a node with a Skyhawk and 2 E5-2640 processors to baremetal E5-2640 with a BE3. Physical MTU was 1500, the VM's vNIC's MTU was 1400. Systems were HPE ProLiants in OS Control Mode for power management, with the "performance" frequency governor loaded. An OpenStack Mitaka setup with Distributed Virtual Router. We had some other NIC types in the setup as well. XPS was also enabled on the ConnectX3-Pro. It was not enabled on the 82599ES (a function of the kernel being used, which had it disabled from the first reports of XPS negatively affecting VM traffic at the beginning of the year) Average Mbit/s From NIC type To Bare Metal BE3: NIC Type, CPU on VM HostBeforeAfter ConnectX-3 Pro,E5-2670v39224 9271 BE3, E5-26409016 9022 82599, E5-2640 9192 9003 BCM57840, E5-2640 9213 9153 Skyhawk, E5-26403884 8897 For completeness: Average Mbit/s To NIC type from Bare Metal BE3: NIC Type, CPU on VM HostBeforeAfter ConnectX-3 Pro,E5-2670v39322 9144 BE3, E5-26409074 9017 82599, E5-2640 8670 8564 BCM57840, E5-2640 2468 * 7979 Skyhawk, E5-26408897 9269 * This is the Busted bnx2x NIC FW GRO implementation issue. It was not visible in the "After" because the system was setup to disable the NIC FW GRO by the time it booted on the fix kernel. Average Transactions/s Between NIC type and Bare Metal BE3: NIC Type, CPU on VM HostBeforeAfter ConnectX-3 Pro,E5-2670v3 12421 12612 BE3, E5-26408178 8484 82599, E5-2640 8499 8549 BCM57840, E5-2640 8544 8560 Skyhawk, E5-26408537 8701 happy benchmarking, Drew Balliet Jeurg Haefliger rick jones The semi-cooked results with additional statistics: 554M - BE3 544+M - ConnectX-3 Pro 560M - 82599ES 630M - BCM57840 650M - Skyhawk (substitute is simply replacing a system name with the model of NIC and CPU) Bulk To (South) and From (North) VM, Before: $ ../substitute.sh vxlan_554m_control_performance_gvnr_dvr_northsouth_stream.log | ~/netperf2_trunk/doc/examples/parse_single_stream.py -r -5 -f 1 -f 3 -f 4 -f 7 -f 8 Field1,Field3,Field4,Field7,Field8,Min,P10,Median,Average,P90,P99,Max,Count North,560M,E5-2640,554FLB,E5-2640,8148.090,9048.830,9235.400,9192.868,9315.980,9338.845,9339.500,113 North,630M,E5-2640,554FLB,E5-2640,8909.980,9113.238,9234.750,9213.140,9299.442,9336.206,9337.830,47 North,544+M,E5-2670v3,554FLB,E5-2640,9013.740,9182.546,9229.620,9224.025,9264.036,9299.206,9301.970,99 North,650M,E5-2640,554FLB,E5-2640,3187.680,3393.724,3796.160,3884.765,4405.096,4941.391,4956.300,129 North,554M,E5-2640,554FLB,E5-2640,8700.930,8855.768,9026.030,9016.061,9158.846,9213.687,9226.150,135 South,554FLB,E5-2640,560M,E5-2640,7754.350,8193.114,8718.540,8670.612,9026.436,9262.355,9285.010,113 South,554FLB,E5-2640,630M,E5-2640,1897.660,2068.290,2514.430,2468.323,2787.162,2942.934,2957.250,53 South,554FLB,E5-2640,544+M,E5-2670v3,9298.260,9314.432,9323.220,9322.207,9328.324,9330.704,9331.080,100 South,554FLB,E5-2640,650M,E5-2640,8407.050,8907.136,9304.390,9206.776,9321.320,9325.347,9326.410,103 South,554FLB,E5-2640,554M,E5-2640,7844.900,8632.530,9199.385,9074.535,9308.070,9319.224,9322.360,132 0 too-short lines ignored. Bulk To (South) and From (North) VM, After: $ ../substitute.sh vxlan_554m_control_performance_gvnr_xpsfix_dvr_northsouth_stream.log | ~/netperf2_trunk/doc/examples/parse_single_stream.py -r -5 -f 1 -f 3 -f 4 -f 7 -f 8 Field1,Field3,Field4,Field7,Field8,Min,P10,Median,Average,P90,P99,Max,Count North,560M,E5-2640,554FLB,E5-2640,7576.790,8213.890,9182.870,9003.190,9295.975,9315.878,9318.160,36 North,630M,E5-2640,554FLB,E5-2640,8811.800,8924.000,9206.660,9153.076,9306.287,9315.152,9315.790,12 North,544+M,E5-2670v3,554FLB,E5-2640,9135.990,9228.520,9277.465,9271.875,9324.545,9339.604,9339.780,46 North,650M,E5-2640,554FLB,E5-2640,8133.420,8483.340,8995.040,8897.779,9129.056,9165.230,9165.860,43 North,554M,E5-2640,554FLB,E5-2640,8438.390,8879.150,9048.590,9022.813,9181.540,9248.650,9297.660,101 South,554FLB,E5-2640,630M,E5-2640,7347.120,7592.565,7951.325,7979.951,8365.400,8575.837,8579.890,16 South,554FLB,E5-2640,560M,E5-2640,7719.510,8044.496,8602.750,8564.741,9172.824,9248.686,9259.070,45 South,554FLB,E5-2640,544+M,E5-2670v3,8838.660,8907.402,9112.335,9114.040,9326.510,9329.062
[PATCH RFC 0/4] xfs: Transmit flow steering
This patch set introduces transmit flow steering. The idea is that we record the transmit queues in a flow table that is indexed by skbuff. The flow table entries have two values: the queue_index and the head cnt of packets from the TX queue. We only allow a queue to change for a flow if the tail cnt in the TX queue advances beyond the recorded head cnt. That is the condition that should indicate that all outstanding packets for the flow have completed transmission so the queue can change. Tracking the inflight queue is performed as part of BQL. Two fields are added to netdevice structure: head_cnt and tail_cnt. head_cnt is incremented in netdev_tx_sent_queue and tail_cnt is incremented in netdev_tx_completed_queue by the number of packets completed. This patch set creates /sys/class/net/eth*/xps_dev_flow_table_cnt which number of entries in the XPS flow table. Tom Herbert (4): net: Set SW hash in skb_set_hash_from_sk bql: Add tracking of inflight packets net: Add xps_dev_flow_table_cnt xfs: Transmit flow steering include/linux/netdevice.h | 26 + include/net/sock.h| 6 +-- net/Kconfig | 6 +++ net/core/dev.c| 93 +++ net/core/net-sysfs.c | 87 5 files changed, 199 insertions(+), 19 deletions(-) -- 2.8.0.rc2