I missed that. The single core throughput is ~250k replies/sec, two
cores ~450k replies/sec, three cores ~650k replies/sec, four cores
~800 replies/sec. These numbers are higher than what I reported in my
previous post. That is most probably because, right now, I am testing
with MTU 9000 (jumbo frames) and with more user-space threads.

Cheers,
Amin

On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado <cas...@nicira.com> wrote:
> Also, do you mind posting the single core throughput?
>
>> [cross-posting to nox-dev, openflow-discuss, ovs-discuss]
>>
>> I have prepared a patch based on NOX Zaku that improves its
>> performance by a factor of>10. This implies that a single controller
>> instance can run a large network with near a million flow initiations
>> per second. I am writing to open up a discussion and get feedback from
>> the community.
>>
>> Here are some preliminary results:
>>
>> - Benchmark configuration:
>>   * Benchmark: Throughput test of cbench (controller benchmarker) with
>> 64 switches. Cbench is a part of the OFlops package
>> (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
>> mode, cbench sends a batch of ofp_packet_in messages to the controller
>> and counts the number of replies it gets back.
>>   * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
>> quad-core Intel Xeon processor (X3210), and 4GB RAM
>>   * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz
>> quad-core Intel Xeon processor (E5405), and 4GB RAM
>>   * Connectivity: 1Gbps
>>
>> - Benchmark results:
>>   * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core).
>>   * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8
>> available cores). The sustained controller->benchmarker throughput is
>> ~400Mbps.
>>
>> The patch updates the asynchronous harness of NOX to a standard
>> library (boost asynchronous I/O library) which simplifies the code
>> base. It fixes the code in several areas, including but not limited
>> to:
>>
>> - Multi-threading: The patch enables having any number of worker
>> threads running on multiple cores.
>>
>> - Batching: Serving requests individually and sending replies one by
>> one is quite inefficient. The patch tries to batch requests together
>> were possible, as well replies (which reduces the number of system
>> calls significantly).
>>
>> - Memory allocation: The standard C++ memory allocator is not robust
>> in multi-threaded environments. Google's Thread-Caching Malloc
>> (TCMalloc) or Hoard memory allocator perform much better for NOX.
>>
>> - Fully asynchronous operation: The patched version avoids wasting CPU
>> cycles polling sockets, or event/timer dispatchers when not necessary.
>>
>> I would like to add that the patched version should perform much
>> better than what I reported above (the number reported is with a run
>> on 4 CPU cores). I guess a single NOX instance running on a machine
>> with 8 CPU cores should handle well above 1 million flow initiation
>> requests per second. Also having a more capable machine should help to
>> serve more requests! The code will be made available soon and I will
>> post updates as well.
>>
>>
>> Cheers,
>> Amin
>> _______________________________________________
>> openflow-discuss mailing list
>> openflow-disc...@lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/openflow-discuss
>
>

_______________________________________________
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org

Reply via email to