Re: L2 network namespaces + macvlan performances
Between the "normal" case and the "net namespace + macvlan" case, results are about the same for both the throughput and the local CPU load for the following test types: TCP_MAERTS, TCP_RR, UDP_STREAM, UDP_RR. macvlan looks like a very good candidate for network namespace in these cases. But, with the TCP_STREAM test, I observed the CPU load is about the same (that's what we wanted) but the throughput decreases by about 5%: from 850MB/s down to 810MB/s. I haven't investigated yet why the throughput decrease in the case. Does it come from my setup, from macvlan additional treatments, other? I don't know yet Given that your "normal" case doesn't hit link-rate on the TCP_STREAM, but it does with UDP_STREAM, it could be that there isn't quite enough TCP window available, particularly given it seems the default settings for sockets/windows are in use. You might try your normal case with the test-specific -S and -s options to increase the socket buffer size: netperf -H 192.168.76.1 -i 30,3 -l 20 -t TCP_STREAM -- -m 1400 -S 128K -S 128K and see if that gets you link-rate. One other possibility there is the use of the 1400 byte send - that probably doesn't interact terribly well with TSO. Also, it isn't (?) likely the MSS for the connection, which you can have reported by adding a "-v 2" to the global options. You could/should then use the MSS in a subsequent test, or perhaps better still use a rather larger send size for TCP_STREAM|TCP_MAERTS - I myself for no particular reason tend to use either 32KB or 64KB as the send size in the netperf TCP_STREAM tests I run. A final WAG - that the 1400 byte send size interacted poorly with the Nagle algorithm since it was a sub-MSS send. When Nagle is involved, things can be very timing-sensitive, change the timing ever so slightly and you can have a rather larger change in throughput. That could be dealt-with either with the larger send sizes mentioned above, or by adding a test-specific -D option to set TCP_NODELAY. happy benchmarking, rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespaces + macvlan performances
On Fri, Jul 06, 2007 at 06:48:15PM +0200, Benjamin Thery wrote: > Following a discussion we had at OLS concerning L2 network namespace > performances and how the new macvlan driver could potentially improve > them, I've ported the macvlan patchset on top of Eric's net namespace > patchset on 2.6.22-rc4-mm2. > > A little bit of history: > > Some months ago, when we ran some performance tests (using netperf) > on net namespace, we observed the following things: > > Using 'etun', the virtual ethernet tunnel driver, and IP routes > from inside a network namespace, > > - The throughput is the same as the "normal" case(*) > (* normal case: no namespace, using physical adapters). > No regression. Good. > > - But the CPU load increases a lot. Bad. > The reasons are: > - All checksums are done in software. No hardware offloading. > - Every TCP packets going through the etun devices are > duplicated in ip_forward() before we decrease the ttl. > (packets are routed between both ends of etun) > > We also made some testing with bridges, and obtained the same results: > CPU load increase: > - No hardware offloading > - Packets are duplicated somewhere in the bridge+netfilter > code (can't remember where right now) > > > This time, I've replaced the etun interface by the new macvlan, > which should benefits from the hardware offloading capabilities of the > physical adapter and suppress the forwarding stuff. > > My test setup is: > > Host AHost B > _____ > | _ | | | > | | Netns 1 | | | | > | | | | | | > | | macvlan0| | | | > | |___|_| | | | > | || | | > |_|| |___| > | eth0 (192.168.0.2) | eth0 (192.168.0.1) > || > - > macvlan0 (192.168.0.3) > > - netperf runs on host A > - netserver runs on host B > - Adapters speed is 1GB/s > > On this setup I ran the following netperf tests: TCP_STREAM, > TCP_MAERTS, TCP_RR, UDP_STREAM, UDP_RR. > > Between the "normal" case and the "net namespace + macvlan" case, > results are about the same for both the throughput and the local CPU > load for the following test types: TCP_MAERTS, TCP_RR, UDP_STREAM, UDP_RR. > > macvlan looks like a very good candidate for network namespace in > these cases. > > But, with the TCP_STREAM test, I observed the CPU load is about the > same (that's what we wanted) but the throughput decreases by about 5%: > from 850MB/s down to 810MB/s. > I haven't investigated yet why the throughput decrease in the case. > Does it come from my setup, from macvlan additional treatments, other? > I don't know yet > > Attached to this email you'll find the raw netperf outputs for the > three cases: > > - netperf through a physical adapter, no namespace: > netperf-results-2.6.22-rc4-mm2-netns1-vanilla.txt > - netperf through etun, inside a namespace: > netperf-results-2.6.22-rc4-mm2-netns1-using-etun.txt > - netperf through macvlan, inside a namespace: > netperf-results-2.6.22-rc4-mm2-netns1-using-macvlan.txt > > macvlan looks promising. nice, any performance tests for multiple network spaces sharing the same eth0 (with different macvlans)? how does that compare to IP isolation performance wise? TIA, Herbert > Regards, > Benjamin > > -- > B e n j a m i n T h e r y - BULL/DT/Open Software R&D > >http://www.bull.com > NETPERF RESULTS: the "normal" case : > > No network namespace, traffic goes through real 1GB/s physical adapters. > > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 > (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. > Recv SendSend Utilization Service Demand > Socket Socket Message Elapsed Send Recv SendRecv > Size SizeSize Time Throughput localremote local remote > bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB > > 87380 16384 140020.03 857.39 6.39 9.75 2.444 3.727 > > > > TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 > (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. > Recv SendSend Utilization Service Demand > Socket Socket Message Elapsed Send Recv SendRecv > Size SizeSize Time Throughput localremote local remote > bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB > > 87380 16384 8738020.03 763.15 4.75 10.
Re: L2 network namespaces + macvlan performances
Benjamin Thery wrote: Following a discussion we had at OLS concerning L2 network namespace performances and how the new macvlan driver could potentially improve them, I've ported the macvlan patchset on top of Eric's net namespace patchset on 2.6.22-rc4-mm2. A little bit of history: Some months ago, when we ran some performance tests (using netperf) on net namespace, we observed the following things: Using 'etun', the virtual ethernet tunnel driver, and IP routes from inside a network namespace, - The throughput is the same as the "normal" case(*) (* normal case: no namespace, using physical adapters). No regression. Good. - But the CPU load increases a lot. Bad. The reasons are: - All checksums are done in software. No hardware offloading. - Every TCP packets going through the etun devices are duplicated in ip_forward() before we decrease the ttl. (packets are routed between both ends of etun) We also made some testing with bridges, and obtained the same results: CPU load increase: - No hardware offloading - Packets are duplicated somewhere in the bridge+netfilter code (can't remember where right now) This time, I've replaced the etun interface by the new macvlan, which should benefits from the hardware offloading capabilities of the physical adapter and suppress the forwarding stuff. My test setup is: Host AHost B _____ | _ | | | | | Netns 1 | | | | | | | | | | | | macvlan0| | | | | |___|_| | | | | || | | |_|| |___| | eth0 (192.168.0.2) | eth0 (192.168.0.1) || - macvlan0 (192.168.0.3) - netperf runs on host A - netserver runs on host B - Adapters speed is 1GB/s On this setup I ran the following netperf tests: TCP_STREAM, TCP_MAERTS, TCP_RR, UDP_STREAM, UDP_RR. Between the "normal" case and the "net namespace + macvlan" case, results are about the same for both the throughput and the local CPU load for the following test types: TCP_MAERTS, TCP_RR, UDP_STREAM, UDP_RR. macvlan looks like a very good candidate for network namespace in these cases. But, with the TCP_STREAM test, I observed the CPU load is about the same (that's what we wanted) but the throughput decreases by about 5%: from 850MB/s down to 810MB/s. I haven't investigated yet why the throughput decrease in the case. Does it come from my setup, from macvlan additional treatments, other? I don't know yet Attached to this email you'll find the raw netperf outputs for the three cases: - netperf through a physical adapter, no namespace: netperf-results-2.6.22-rc4-mm2-netns1-vanilla.txt - netperf through etun, inside a namespace: netperf-results-2.6.22-rc4-mm2-netns1-using-etun.txt - netperf through macvlan, inside a namespace: netperf-results-2.6.22-rc4-mm2-netns1-using-macvlan.txt macvlan looks promising. Regards, Benjamin Very interesting. Thank you very much Benjamin for investigating this. I will update the http://lxc.sf.net web site with your description and results. NETPERF RESULTS: the "normal" case : No network namespace, traffic goes through real 1GB/s physical adapters. TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 140020.03 857.39 6.39 9.75 2.444 3.727 TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 8738020.03 763.15 4.75 10.332.038 4.434 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. Local /Remote Socket Size Request Resp. Elapsed Trans.
L2 network namespaces + macvlan performances
Following a discussion we had at OLS concerning L2 network namespace performances and how the new macvlan driver could potentially improve them, I've ported the macvlan patchset on top of Eric's net namespace patchset on 2.6.22-rc4-mm2. A little bit of history: Some months ago, when we ran some performance tests (using netperf) on net namespace, we observed the following things: Using 'etun', the virtual ethernet tunnel driver, and IP routes from inside a network namespace, - The throughput is the same as the "normal" case(*) (* normal case: no namespace, using physical adapters). No regression. Good. - But the CPU load increases a lot. Bad. The reasons are: - All checksums are done in software. No hardware offloading. - Every TCP packets going through the etun devices are duplicated in ip_forward() before we decrease the ttl. (packets are routed between both ends of etun) We also made some testing with bridges, and obtained the same results: CPU load increase: - No hardware offloading - Packets are duplicated somewhere in the bridge+netfilter code (can't remember where right now) This time, I've replaced the etun interface by the new macvlan, which should benefits from the hardware offloading capabilities of the physical adapter and suppress the forwarding stuff. My test setup is: Host AHost B _____ | _ | | | | | Netns 1 | | | | | | | | | | | | macvlan0| | | | | |___|_| | | | | || | | |_|| |___| | eth0 (192.168.0.2) | eth0 (192.168.0.1) || - macvlan0 (192.168.0.3) - netperf runs on host A - netserver runs on host B - Adapters speed is 1GB/s On this setup I ran the following netperf tests: TCP_STREAM, TCP_MAERTS, TCP_RR, UDP_STREAM, UDP_RR. Between the "normal" case and the "net namespace + macvlan" case, results are about the same for both the throughput and the local CPU load for the following test types: TCP_MAERTS, TCP_RR, UDP_STREAM, UDP_RR. macvlan looks like a very good candidate for network namespace in these cases. But, with the TCP_STREAM test, I observed the CPU load is about the same (that's what we wanted) but the throughput decreases by about 5%: from 850MB/s down to 810MB/s. I haven't investigated yet why the throughput decrease in the case. Does it come from my setup, from macvlan additional treatments, other? I don't know yet Attached to this email you'll find the raw netperf outputs for the three cases: - netperf through a physical adapter, no namespace: netperf-results-2.6.22-rc4-mm2-netns1-vanilla.txt - netperf through etun, inside a namespace: netperf-results-2.6.22-rc4-mm2-netns1-using-etun.txt - netperf through macvlan, inside a namespace: netperf-results-2.6.22-rc4-mm2-netns1-using-macvlan.txt macvlan looks promising. Regards, Benjamin -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com NETPERF RESULTS: the "normal" case : No network namespace, traffic goes through real 1GB/s physical adapters. TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 140020.03 857.39 6.39 9.75 2.444 3.727 TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 8738020.03 763.15 4.75 10.332.038 4.434 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.76.1 (192.168.76.1) port 0 AF_INET : +/-2.5% @ 95% conf. Local /Remote Socket Size Request Resp. Elapsed Trans. CPUCPUS.dem S.dem Send Recv SizeSize TimeRate local remote local remote bytes bytes bytes bytes