I missed that. The single core throughput is ~250k replies/sec, two cores ~450k replies/sec, three cores ~650k replies/sec, four cores ~800 replies/sec. These numbers are higher than what I reported in my previous post. That is most probably because, right now, I am testing with MTU 9000 (jumbo frames) and with more user-space threads.
Cheers, Amin On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado <cas...@nicira.com> wrote: > Also, do you mind posting the single core throughput? > >> [cross-posting to nox-dev, openflow-discuss, ovs-discuss] >> >> I have prepared a patch based on NOX Zaku that improves its >> performance by a factor of>10. This implies that a single controller >> instance can run a large network with near a million flow initiations >> per second. I am writing to open up a discussion and get feedback from >> the community. >> >> Here are some preliminary results: >> >> - Benchmark configuration: >> * Benchmark: Throughput test of cbench (controller benchmarker) with >> 64 switches. Cbench is a part of the OFlops package >> (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput >> mode, cbench sends a batch of ofp_packet_in messages to the controller >> and counts the number of replies it gets back. >> * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz >> quad-core Intel Xeon processor (X3210), and 4GB RAM >> * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz >> quad-core Intel Xeon processor (E5405), and 4GB RAM >> * Connectivity: 1Gbps >> >> - Benchmark results: >> * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core). >> * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8 >> available cores). The sustained controller->benchmarker throughput is >> ~400Mbps. >> >> The patch updates the asynchronous harness of NOX to a standard >> library (boost asynchronous I/O library) which simplifies the code >> base. It fixes the code in several areas, including but not limited >> to: >> >> - Multi-threading: The patch enables having any number of worker >> threads running on multiple cores. >> >> - Batching: Serving requests individually and sending replies one by >> one is quite inefficient. The patch tries to batch requests together >> were possible, as well replies (which reduces the number of system >> calls significantly). >> >> - Memory allocation: The standard C++ memory allocator is not robust >> in multi-threaded environments. Google's Thread-Caching Malloc >> (TCMalloc) or Hoard memory allocator perform much better for NOX. >> >> - Fully asynchronous operation: The patched version avoids wasting CPU >> cycles polling sockets, or event/timer dispatchers when not necessary. >> >> I would like to add that the patched version should perform much >> better than what I reported above (the number reported is with a run >> on 4 CPU cores). I guess a single NOX instance running on a machine >> with 8 CPU cores should handle well above 1 million flow initiation >> requests per second. Also having a more capable machine should help to >> serve more requests! The code will be made available soon and I will >> post updates as well. >> >> >> Cheers, >> Amin >> _______________________________________________ >> openflow-discuss mailing list >> openflow-disc...@lists.stanford.edu >> https://mailman.stanford.edu/mailman/listinfo/openflow-discuss > > _______________________________________________ nox-dev mailing list nox-dev@noxrepo.org http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org