Re: [nox-dev] dev/destiny-fast doesn't respond

2011-12-27 Thread Amin Tootoonchian
Looks like you are already seeing ~3M packet-ins per sec (3k per msec
= 3M per sec).

Amin

On Tue, Dec 27, 2011 at 2:05 PM, Volkan YAZICI  wrote:
> Thanks David! You are right, removing tcmalloc for nox_core solved the
> problem.
>
> --8<---cut here---start->8---
> $ dpkg -l | grep tcmalloc
> ii  libtcmalloc-mi 1.5-1          an efficient thread-caching malloc
> $ nox_core -i ptcp:6633 switch -l ~/usr/bin/nox -t 7
> $ cbench -c localhost -p 6633 -m 1 -l 10 -s 32 -M 100 -t
> cbench: controller benchmarking tool
>   running in mode 'throughput'
>   connecting to controller at localhost:6633
>   faking 32 switches :: 10 tests each; 1 ms per test
>   with 100 unique source MACs per switch
>   starting test with 0 ms delay after features_reply
>   ignoring first 1 "warmup" and last 0 "cooldown" loops
>   debugging info is off
> 32  switches: fmods/sec:  630907  ...  total = 1977.609403 per ms
> 32  switches: fmods/sec:  799125  ...  total = 2558.905526 per ms
> 32  switches: fmods/sec:  903720  ...  total = 2901.221645 per ms
> 32  switches: fmods/sec:  900237  ...  total = 2868.801376 per ms
> 32  switches: fmods/sec:  875842  ...  total = 2825.217623 per ms
> ...
> --8<---cut here---end--->8---
>
> This is a reasonably powerful machine, that is,
>
> --8<---cut here---start->8---
> $ cat /etc/debian_version
> 6.0.3
> $ uname -a
> Linux odun 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64 GNU/Linux
> $ grep ^processor /proc/cpuinfo | wc -l
> 8
> $ grep "^model name" /proc/cpuinfo | head -n 1
> model name      : Intel(R) Xeon(R) CPU           E5606  @ 2.13GHz
> --8<---cut here---end--->8---
>
> I still couldn't understand how do you get the results at million level
> in your comparisons. Am I missing something? What should I suspect? Can
> tcmalloc cause such a 1000x performance impact?
>
>
> Best.
>
> On Tue, 27 Dec 2011 10:44:39 -0800, David Erickson writes:
>> What tcmalloc version do you have, and what OS?  Try launching without 
>> tcmalloc,
>> on some combinations NOX would just hang when a switch connects and you are
>> using tcmalloc.
> ___
> nox-dev mailing list
> nox-dev@noxrepo.org
> http://noxrepo.org/mailman/listinfo/nox-dev
___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev


Re: [nox-dev] Error building dev/destiny-fast branch

2011-10-28 Thread Amin Tootoonchian
That is right. The reasoning is similar: scheduling threads has some
overhead in boost::asio and adding more threads adds to the
contention. To minimize this adverse effect in this particular
experiment, one can change the code to have read/write handlers be
only invoked from the same thread. Here is how to do that in
dev/destiny-fast:

* Open src/lib/openflow.c.
* Find the lines with async_{read,write}*.
* Add strand.wrap() around boost::bind.

If you are interested in a more in-depth analysis, let me know. We
have a draft that we can share with you.

Hope it helps,
Amin

On Thu, Oct 27, 2011 at 10:07 PM, Andreas Voellmy
 wrote:
> Thanks. I noticed that when I run cbench with 1 switch against the learning
> switch controller in destiny, I get much worse throughput if I run with 32
> OS threads than with 1 OS thread. With 1 OS thread I get 470k requests per
> second, while with 32 OS threads I get 22k requests per second. With even 2
> OS threads, the throughput drops to 49k requests per second. Have you
> noticed this behavior? Do you know why this is happening? Is this something
> that will be fixed in your upcoming release of this branch?
> Andreas
>
> On Thu, Oct 27, 2011 at 12:15 PM, Amin Tootoonchian 
> wrote:
>>
>> On Wed, Oct 26, 2011 at 7:32 PM, Andreas Voellmy
>>  wrote:
>> > On Wed, Oct 26, 2011 at 8:42 PM, Amin Tootoonchian 
>> > wrote:
>> >>
>> >> I only updated the 'switch' app in that code base, and I never looked
>> >> at 'hub'. My guess is that the hub app is doing so little that locking
>> >> within boost::asio scheduler outweights the actual work done by the
>> >> hub app. We need to make sure that the amount of work done by each
>> >> thread upon its invocation is significantly more than the locking
>> >> overhead in boost::asio's internal job queue.
>> >>
>> >
>> > I'm unclear about how components in the destiny branch work. Do the
>> > handlers
>> > run concurrently by default, or is there something extra that one has to
>> > write to get them to execute concurrently? If something extra is needed,
>> > what is it in switch.cc that makes it execute concurrently? Or are you
>> > saying that the event handlers in 'hub' are indeed running concurrently,
>> > but
>> > they aren't doing enough work to get much performance gain? (By the way,
>> > I
>> > was looking at /src/nox/coreapps/switch/switch.cc
>> > and /src/nox/coreapps/hub/hub.cc)
>> >
>> > Thanks,
>> > Andreas
>>
>> They run concurrently by default. They should be indeed running
>> concurrently, but I am guessing locking overhead within boost::asio
>> significantly outweights the actual work done by each thread. It
>> shouldn't be hard to fix, but not worth it since we consider that code
>> base to be just a proof of concept.
>>
>> Thanks,
>> Amin
>>
>> >>
>> >> Cheers,
>> >> Amin
>> >>
>> >> P.S.: Btw, passing '--enable-ndebug' to configure should boost the
>> >> performace.
>> >>
>> >> On Wed, Oct 26, 2011 at 2:08 PM, Andreas Voellmy
>> >>  wrote:
>> >> > Thanks. The code compiled after configuring without python.
>> >> > I was able to get roughly the same kind of performance out of the
>> >> > 'switch'
>> >> > application that is mentioned on the performance page
>> >> >
>> >> >
>> >> > (http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons).
>> >> > However, the 'hub' controller doesn't have much speedup when running
>> >> > with
>> >> > more threads. For example, when running with one thread I get a
>> >> > throughput
>> >> > of 213868.81 and when I run it with 8 threads I get a throughput
>> >> > of 264017.35. (To run with 8 threads, I am starting the controller
>> >> > like
>> >> > this: "./nox_core -i ptcp: hub -t 8"; I am testing with cbench in
>> >> > throughput mode "cbench -p  -t")
>> >> > Is this - that 'hub' gets not much speedup while 'switch' gets lots
>> >> > of
>> >> > speedup - expected with this branch of NOX? Is there something that
>> >> > needs to
>> >> > be done to hub in order to enable the framework to run it
>> >> > concurrently?
>> >> > Regards,
>> &

Re: [nox-dev] Error building dev/destiny-fast branch

2011-10-27 Thread Amin Tootoonchian
On Wed, Oct 26, 2011 at 7:32 PM, Andreas Voellmy
 wrote:
> On Wed, Oct 26, 2011 at 8:42 PM, Amin Tootoonchian 
> wrote:
>>
>> I only updated the 'switch' app in that code base, and I never looked
>> at 'hub'. My guess is that the hub app is doing so little that locking
>> within boost::asio scheduler outweights the actual work done by the
>> hub app. We need to make sure that the amount of work done by each
>> thread upon its invocation is significantly more than the locking
>> overhead in boost::asio's internal job queue.
>>
>
> I'm unclear about how components in the destiny branch work. Do the handlers
> run concurrently by default, or is there something extra that one has to
> write to get them to execute concurrently? If something extra is needed,
> what is it in switch.cc that makes it execute concurrently? Or are you
> saying that the event handlers in 'hub' are indeed running concurrently, but
> they aren't doing enough work to get much performance gain? (By the way, I
> was looking at /src/nox/coreapps/switch/switch.cc
> and /src/nox/coreapps/hub/hub.cc)
>
> Thanks,
> Andreas

They run concurrently by default. They should be indeed running
concurrently, but I am guessing locking overhead within boost::asio
significantly outweights the actual work done by each thread. It
shouldn't be hard to fix, but not worth it since we consider that code
base to be just a proof of concept.

Thanks,
Amin

>>
>> Cheers,
>> Amin
>>
>> P.S.: Btw, passing '--enable-ndebug' to configure should boost the
>> performace.
>>
>> On Wed, Oct 26, 2011 at 2:08 PM, Andreas Voellmy
>>  wrote:
>> > Thanks. The code compiled after configuring without python.
>> > I was able to get roughly the same kind of performance out of the
>> > 'switch'
>> > application that is mentioned on the performance page
>> >
>> > (http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons).
>> > However, the 'hub' controller doesn't have much speedup when running
>> > with
>> > more threads. For example, when running with one thread I get a
>> > throughput
>> > of 213868.81 and when I run it with 8 threads I get a throughput
>> > of 264017.35. (To run with 8 threads, I am starting the controller like
>> > this: "./nox_core -i ptcp: hub -t 8"; I am testing with cbench in
>> > throughput mode "cbench -p  -t")
>> > Is this - that 'hub' gets not much speedup while 'switch' gets lots of
>> > speedup - expected with this branch of NOX? Is there something that
>> > needs to
>> > be done to hub in order to enable the framework to run it concurrently?
>> > Regards,
>> > Andreas
>> >
>> > On Wed, Oct 26, 2011 at 5:53 AM, Murphy McCauley  wrote:
>> >>
>> >> This branch is quite a bit behind the actual development.  We're
>> >> preparing
>> >> to release the updated codebase in the near future.
>> >> But for one thing, Python doesn't work in it.  So you probably need to
>> >> do
>> >> --with-python=no when you run configure.
>> >> Hope that helps.
>> >> -- Murphy
>> >> On Oct 25, 2011, at 8:49 PM, Andreas Voellmy wrote:
>> >>
>> >> Thanks. I tried editing the conflict marker out in a couple ways that
>> >> seemed reasonable to me, but I got other compile errors. Does anyone
>> >> know if
>> >> there is a known working version of this branch in the repository, and
>> >> how I
>> >> can get back to it?
>> >> Thanks,
>> >> Andreas
>> >>
>> >> 2011/10/25 Zoltán Lajos Kis 
>> >>>
>> >>> Seems like someone checked in a conflict marker to that file:
>> >>>
>> >>>
>> >>>
>> >>> http://noxrepo.org/cgi-bin/gitweb.cgi?p=nox;a=blob;f=src/nox/coreapps/pyrt/context.i;h=cb8641d72feb3a1f0543e97830a2addd55d502b9;hb=dev/destiny-fast#l83
>> >>>
>> >>> Z.
>> >>>
>> >>> 
>> >>> From: nox-dev-boun...@noxrepo.org [nox-dev-boun...@noxrepo.org] On
>> >>> Behalf
>> >>> Of Andreas Voellmy [andreas.voel...@gmail.com]
>> >>> Sent: Wednesday, October 26, 2011 4:40 AM
>> >>> To: nox-dev@noxrepo.org
>> >>> Subject: [nox-dev] Error building dev/destiny-fast branch
>&

Re: [nox-dev] Error building dev/destiny-fast branch

2011-10-26 Thread Amin Tootoonchian
I only updated the 'switch' app in that code base, and I never looked
at 'hub'. My guess is that the hub app is doing so little that locking
within boost::asio scheduler outweights the actual work done by the
hub app. We need to make sure that the amount of work done by each
thread upon its invocation is significantly more than the locking
overhead in boost::asio's internal job queue.

If that is the case, since we are working on a new release, it doesn't
make much sense to fix it in that code base. Could you wait for that?

Cheers,
Amin

P.S.: Btw, passing '--enable-ndebug' to configure should boost the performace.

On Wed, Oct 26, 2011 at 2:08 PM, Andreas Voellmy
 wrote:
> Thanks. The code compiled after configuring without python.
> I was able to get roughly the same kind of performance out of the 'switch'
> application that is mentioned on the performance page
> (http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons).
> However, the 'hub' controller doesn't have much speedup when running with
> more threads. For example, when running with one thread I get a throughput
> of 213868.81 and when I run it with 8 threads I get a throughput
> of 264017.35. (To run with 8 threads, I am starting the controller like
> this: "./nox_core -i ptcp: hub -t 8"; I am testing with cbench in
> throughput mode "cbench -p  -t")
> Is this - that 'hub' gets not much speedup while 'switch' gets lots of
> speedup - expected with this branch of NOX? Is there something that needs to
> be done to hub in order to enable the framework to run it concurrently?
> Regards,
> Andreas
>
> On Wed, Oct 26, 2011 at 5:53 AM, Murphy McCauley  wrote:
>>
>> This branch is quite a bit behind the actual development.  We're preparing
>> to release the updated codebase in the near future.
>> But for one thing, Python doesn't work in it.  So you probably need to do
>> --with-python=no when you run configure.
>> Hope that helps.
>> -- Murphy
>> On Oct 25, 2011, at 8:49 PM, Andreas Voellmy wrote:
>>
>> Thanks. I tried editing the conflict marker out in a couple ways that
>> seemed reasonable to me, but I got other compile errors. Does anyone know if
>> there is a known working version of this branch in the repository, and how I
>> can get back to it?
>> Thanks,
>> Andreas
>>
>> 2011/10/25 Zoltán Lajos Kis 
>>>
>>> Seems like someone checked in a conflict marker to that file:
>>>
>>>
>>> http://noxrepo.org/cgi-bin/gitweb.cgi?p=nox;a=blob;f=src/nox/coreapps/pyrt/context.i;h=cb8641d72feb3a1f0543e97830a2addd55d502b9;hb=dev/destiny-fast#l83
>>>
>>> Z.
>>>
>>> 
>>> From: nox-dev-boun...@noxrepo.org [nox-dev-boun...@noxrepo.org] On Behalf
>>> Of Andreas Voellmy [andreas.voel...@gmail.com]
>>> Sent: Wednesday, October 26, 2011 4:40 AM
>>> To: nox-dev@noxrepo.org
>>> Subject: [nox-dev] Error building dev/destiny-fast branch
>>>
>>> Hi,
>>>
>>> I'd like to try the destiny-fast branch (I saw it mentioned here:
>>> http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons), so
>>> I did the following
>>>
>>> git clone git://noxrepo.org/nox
>>> cd nox
>>> git checkout dev/destiny-fast
>>>
>>> Is that the right way to get this branch? After that I ran
>>> ./boot.sh
>>> mkdir build
>>> cd build
>>> ../configure
>>> make
>>>
>>> and got the following error:
>>>
>>> Making all in pyrt
>>> make[8]: Entering directory
>>> `/home/av/Download/nox-destiny/nox/build/src/nox/coreapps/pyrt'
>>> /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64
>>> -I../../../../../src/include/openflow -I../../../../../src/nox/lib/ -outdir
>>> ./. -o oxidereactor_wrap.cc -module oxidereactor
>>> ../../../../../src/nox/coreapps/pyrt/oxidereactor.i
>>> /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64 -outdir ./. -o
>>> deferredcallback_wrap.cc -module deferredcallback
>>> ../../../../../src/nox/coreapps/pyrt/deferredcallback.i
>>> /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64
>>> -I../../../../../src/include/openflow -I../../../../../src/nox/lib/ -outdir
>>> ./. -o pycomponent_wrap.cc -module pycomponent
>>> ../../../../../src/nox/coreapps/pyrt/component.i
>>> ../../../../../src/nox/coreapps/pyrt/context.i:79: Error: Syntax error in
>>> input(3).
>>> make[8]: *** [pycomponent.py] Error 1
>>>
>>> Does anyone know what went wrong and how to fix this?
>>>
>>> Thanks,
>>> Andreas
>>>
>>
>> ___
>> nox-dev mailing list
>> nox-dev@noxrepo.org
>> http://noxrepo.org/mailman/listinfo/nox-dev
>>
>
>
> ___
> nox-dev mailing list
> nox-dev@noxrepo.org
> http://noxrepo.org/mailman/listinfo/nox-dev
>
>
___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev


Re: [nox-dev] NOX Zaku with OF 1.1 support (C++ only)

2011-05-12 Thread Amin Tootoonchian
That would be great! I will be able to work on it again in two weeks I
guess. Just a couple of quick notes:

* So far I have only ported the switch app. Porting is super easy for
most apps: just need to add a boost mutex to protect the data
structure.
* I think some apps should be rewritten with performance in mind.
* The most important missing application is discovery which you have
already ported to C++.
* dev/destiny-fast branch needs testing. It has been mostly used in
different benchmarks and there are parts of the system which I never
tested after rewriting (e.g., SSL)!

Amin

>> Having a C++-only fork of Nox is long overdue.  There are many Nox 
>> developers who have expressed interest in this.  Off hand, I would suggest 
>> that we pull in Amin's changes (which makes Nox blazingly fast) and remove 
>> most spurious apps.
___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev


Re: [nox-dev] Maestro: a new scalable OpenFlow controller

2011-01-06 Thread Amin Tootoonchian
Hi Zheng,

> Sorry for the delay, and thanks a lot for all your comments. First I want to
> clarify that the performance number we measured in the tech report is for
> the "routing" application. Because the routing application needs to generate
> multiple flow configuration messages for all switches along the path one
> packet is going to take, its performance is considered to be worse than that
> of the "switch" application. We also use our simulator (which supports LLDP
> packets exchanging between neighbor switches so that the routing application
> could work) to evaluate the "switch" application of NOX, and get a
> throughput of 90K rps on a 3.1Ghz machine. But for the "routing"
> application, even after I turn on the ndebug option (by running ./configure
> --enable-ndebug and make, and I hope this is the right way to do it), the
> throughput is still around 20K rps. All the numbers in the tech report are
> for the "routing" application. I hope this makes sense to you.

Thanks for your reply. Using the routing application explains some of
your observation. However, I think benchmarks should be based on the
switch application or a no-op application to illustrate the overhead
of the controller itself (i.e., the underlying framework) as the
baseline. I think the reason for routing application performing badly
here is due to binding lookups (this is just a guess) and for sure it
could be tuned to perform significantly better than 20Krps.

> About the latency, we measure it in an end-to-end style. That is, we measure
> the start time-stamp for a request before we call the "socket.send"
> function, and measure the end time-stamp after we receive the
> flow_mod/packet_out for that particular request. I agree that NOX will
> throttle the connection and the latency of a request within NOX is very
> small. It's just the latency we measure is the end-to-end delay, which
> includes the queuing delay in both the sending buffer and receiving buffer.

Regarding latency measurements, it is tricky to measure stuff
precisely at a sub-millisecond resolution. One should be very careful
with the resolution of timers and avoid being affected by the
operating system scheduler. If you are seeing latencies close to the
scheduler quantum size (under Linux typically 10ms) your measurement
might be affected by the scheduler. There is no real workaround for
this under a non-realtime kernel; however, running the controller,
benchmarker, and the latency measurement tool with CPU affinity set
(non-overlapping CPU sets -- using taskset -c X) and with a
high-priority FIFO scheduler (chrt -f X under Linux) relieves many of
these side effects. Btw, I was talking about the end-to-end latency
previously. It is quite possible to reduce that

> Furthermore, I would like to know that is the code of HyperFlow, or the NOX
> multi-threading patch you mentioned, going to be available? Because I really
> want to study the difference of all the existing systems, and hopefully my
> effort could contribute to this community :)

I think we could make the NOX multi-threading patch publicly available
soon as a branch on noxrepo. The code needs to be reviewed and tested
by the NOX team and the community before making its way to the
mainstream NOX. I, for sure, appreciate your efforts very much. The
only thing is that if we had a framework to compare controller
performance we could better understand the trade-offs in controller
design. Rob's oflops/cbench package is a great start, and we could all
contribute to it.

Cheers,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
I am talking about jumbo Ethernet frames here. By batching, I mean
batching outgoing messages together and writing to the underlying
layer which would be the TCP write buffer. The TCP buffer is not
limited to MTU or anything like that, so in most cases my code flushes
more than 64KB to the TCP write buffer. The gain is due to issuing a
single system call with a larger buffer rather than many system calls
with tiny buffers (e.g., 128 bytes you mentioned).

I do not sacrifice delay for throughput here. I keep a write buffer
and keep appending to it until the underlying socket is ready for
writes. Once it is ready for a write operation, buffered replies are
flush to the underlying layer immediately. This is quite different
than Nagle's algorithm and will not add any delays.

Amin

On Wed, Dec 15, 2010 at 3:47 PM, kk yap  wrote:
> Oh.. another point, if you are batching the frames, then what about
> delay?  There seems to be a trade-off between delay and throughput,
> and we have went for the former by disabling Nagle's algorithm.
>
> Regards
> KK
>
> On 15 December 2010 12:46, kk yap  wrote:
>> Hi Amin,
>>
>> Just to clarify, does your jumbo frames refer to the OpenFlow messages
>> or the frames in the datapath?   By OpenFlow messages, I am assuming
>> you use a TCP connection between NOX and the switches, and you are
>> batching the messages into jumbo frames of 9000 bytes before sending
>> them out.  By frames in the datapath, I mean jumbo Ethernet frames are
>> being sent in the datapath.  The latter does not make any sense to me,
>> because OpenFlow should send 128 bytes to the controller by default.
>>
>> Thanks.
>>
>> Regards
>> KK
>>
>> On 15 December 2010 12:36, Amin Tootoonchian  wrote:
>>> I double checked. It does slightly improve the performance (in the
>>> order of a few thousand replies/sec). Larger MTUs decrease the CPU
>>> workload (by decreasing the number of transfers across the bus) and
>>> this means that more CPU cycles are available to the controller to
>>> process requests. However, I am not suggesting that people should use
>>> jumbo frames. Apparently running with more user-space threads does the
>>> trick here. Anyway, I should trust a profiler rather than guessing, so
>>> I will get back with a definite answer once I have done a more
>>> thorough evaluation.
>>>
>>> Cheers,
>>> Amin
>>>
>>> On Wed, Dec 15, 2010 at 2:51 PM, kk yap  wrote:
>>>> Random curiosity: Why would jumbo frames increases replies per sec?
>>>>
>>>> Regards
>>>> KK
>>>>
>>>> On 15 December 2010 11:45, Amin Tootoonchian  wrote:
>>>>> I missed that. The single core throughput is ~250k replies/sec, two
>>>>> cores ~450k replies/sec, three cores ~650k replies/sec, four cores
>>>>> ~800 replies/sec. These numbers are higher than what I reported in my
>>>>> previous post. That is most probably because, right now, I am testing
>>>>> with MTU 9000 (jumbo frames) and with more user-space threads.
>>>>>
>>>>> Cheers,
>>>>> Amin
>>>>>
>>>>> On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado  wrote:
>>>>>> Also, do you mind posting the single core throughput?
>>>>>>
>>>>>>> [cross-posting to nox-dev, openflow-discuss, ovs-discuss]
>>>>>>>
>>>>>>> I have prepared a patch based on NOX Zaku that improves its
>>>>>>> performance by a factor of>10. This implies that a single controller
>>>>>>> instance can run a large network with near a million flow initiations
>>>>>>> per second. I am writing to open up a discussion and get feedback from
>>>>>>> the community.
>>>>>>>
>>>>>>> Here are some preliminary results:
>>>>>>>
>>>>>>> - Benchmark configuration:
>>>>>>>   * Benchmark: Throughput test of cbench (controller benchmarker) with
>>>>>>> 64 switches. Cbench is a part of the OFlops package
>>>>>>> (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
>>>>>>> mode, cbench sends a batch of ofp_packet_in messages to the controller
>>>>>>> and counts the number of replies it gets back.
>>>>>>>   * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
>>>>>>> quad-core Intel Xeon processor (X3210), and 4GB RAM
>>>&g

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
I double checked. It does slightly improve the performance (in the
order of a few thousand replies/sec). Larger MTUs decrease the CPU
workload (by decreasing the number of transfers across the bus) and
this means that more CPU cycles are available to the controller to
process requests. However, I am not suggesting that people should use
jumbo frames. Apparently running with more user-space threads does the
trick here. Anyway, I should trust a profiler rather than guessing, so
I will get back with a definite answer once I have done a more
thorough evaluation.

Cheers,
Amin

On Wed, Dec 15, 2010 at 2:51 PM, kk yap  wrote:
> Random curiosity: Why would jumbo frames increases replies per sec?
>
> Regards
> KK
>
> On 15 December 2010 11:45, Amin Tootoonchian  wrote:
>> I missed that. The single core throughput is ~250k replies/sec, two
>> cores ~450k replies/sec, three cores ~650k replies/sec, four cores
>> ~800 replies/sec. These numbers are higher than what I reported in my
>> previous post. That is most probably because, right now, I am testing
>> with MTU 9000 (jumbo frames) and with more user-space threads.
>>
>> Cheers,
>> Amin
>>
>> On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado  wrote:
>>> Also, do you mind posting the single core throughput?
>>>
>>>> [cross-posting to nox-dev, openflow-discuss, ovs-discuss]
>>>>
>>>> I have prepared a patch based on NOX Zaku that improves its
>>>> performance by a factor of>10. This implies that a single controller
>>>> instance can run a large network with near a million flow initiations
>>>> per second. I am writing to open up a discussion and get feedback from
>>>> the community.
>>>>
>>>> Here are some preliminary results:
>>>>
>>>> - Benchmark configuration:
>>>>   * Benchmark: Throughput test of cbench (controller benchmarker) with
>>>> 64 switches. Cbench is a part of the OFlops package
>>>> (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
>>>> mode, cbench sends a batch of ofp_packet_in messages to the controller
>>>> and counts the number of replies it gets back.
>>>>   * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
>>>> quad-core Intel Xeon processor (X3210), and 4GB RAM
>>>>   * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz
>>>> quad-core Intel Xeon processor (E5405), and 4GB RAM
>>>>   * Connectivity: 1Gbps
>>>>
>>>> - Benchmark results:
>>>>   * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core).
>>>>   * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8
>>>> available cores). The sustained controller->benchmarker throughput is
>>>> ~400Mbps.
>>>>
>>>> The patch updates the asynchronous harness of NOX to a standard
>>>> library (boost asynchronous I/O library) which simplifies the code
>>>> base. It fixes the code in several areas, including but not limited
>>>> to:
>>>>
>>>> - Multi-threading: The patch enables having any number of worker
>>>> threads running on multiple cores.
>>>>
>>>> - Batching: Serving requests individually and sending replies one by
>>>> one is quite inefficient. The patch tries to batch requests together
>>>> were possible, as well replies (which reduces the number of system
>>>> calls significantly).
>>>>
>>>> - Memory allocation: The standard C++ memory allocator is not robust
>>>> in multi-threaded environments. Google's Thread-Caching Malloc
>>>> (TCMalloc) or Hoard memory allocator perform much better for NOX.
>>>>
>>>> - Fully asynchronous operation: The patched version avoids wasting CPU
>>>> cycles polling sockets, or event/timer dispatchers when not necessary.
>>>>
>>>> I would like to add that the patched version should perform much
>>>> better than what I reported above (the number reported is with a run
>>>> on 4 CPU cores). I guess a single NOX instance running on a machine
>>>> with 8 CPU cores should handle well above 1 million flow initiation
>>>> requests per second. Also having a more capable machine should help to
>>>> serve more requests! The code will be made available soon and I will
>>>> post updates as well.
>>>>
>>>>
>>>> Cheers,
>>>> Amin
>>>> ___
>>>> openflow-discuss mailing list
>>>> openflow-disc...@lists.stanford.edu
>>>> https://mailman.stanford.edu/mailman/listinfo/openflow-discuss
>>>
>>>
>>
>> ___
>> nox-dev mailing list
>> nox-dev@noxrepo.org
>> http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org
>>
>

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
I missed that. The single core throughput is ~250k replies/sec, two
cores ~450k replies/sec, three cores ~650k replies/sec, four cores
~800 replies/sec. These numbers are higher than what I reported in my
previous post. That is most probably because, right now, I am testing
with MTU 9000 (jumbo frames) and with more user-space threads.

Cheers,
Amin

On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado  wrote:
> Also, do you mind posting the single core throughput?
>
>> [cross-posting to nox-dev, openflow-discuss, ovs-discuss]
>>
>> I have prepared a patch based on NOX Zaku that improves its
>> performance by a factor of>10. This implies that a single controller
>> instance can run a large network with near a million flow initiations
>> per second. I am writing to open up a discussion and get feedback from
>> the community.
>>
>> Here are some preliminary results:
>>
>> - Benchmark configuration:
>>   * Benchmark: Throughput test of cbench (controller benchmarker) with
>> 64 switches. Cbench is a part of the OFlops package
>> (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
>> mode, cbench sends a batch of ofp_packet_in messages to the controller
>> and counts the number of replies it gets back.
>>   * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
>> quad-core Intel Xeon processor (X3210), and 4GB RAM
>>   * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz
>> quad-core Intel Xeon processor (E5405), and 4GB RAM
>>   * Connectivity: 1Gbps
>>
>> - Benchmark results:
>>   * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core).
>>   * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8
>> available cores). The sustained controller->benchmarker throughput is
>> ~400Mbps.
>>
>> The patch updates the asynchronous harness of NOX to a standard
>> library (boost asynchronous I/O library) which simplifies the code
>> base. It fixes the code in several areas, including but not limited
>> to:
>>
>> - Multi-threading: The patch enables having any number of worker
>> threads running on multiple cores.
>>
>> - Batching: Serving requests individually and sending replies one by
>> one is quite inefficient. The patch tries to batch requests together
>> were possible, as well replies (which reduces the number of system
>> calls significantly).
>>
>> - Memory allocation: The standard C++ memory allocator is not robust
>> in multi-threaded environments. Google's Thread-Caching Malloc
>> (TCMalloc) or Hoard memory allocator perform much better for NOX.
>>
>> - Fully asynchronous operation: The patched version avoids wasting CPU
>> cycles polling sockets, or event/timer dispatchers when not necessary.
>>
>> I would like to add that the patched version should perform much
>> better than what I reported above (the number reported is with a run
>> on 4 CPU cores). I guess a single NOX instance running on a machine
>> with 8 CPU cores should handle well above 1 million flow initiation
>> requests per second. Also having a more capable machine should help to
>> serve more requests! The code will be made available soon and I will
>> post updates as well.
>>
>>
>> Cheers,
>> Amin
>> ___
>> openflow-discuss mailing list
>> openflow-disc...@lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/openflow-discuss
>
>

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
As Martin said, in some cases cbench may significantly over-report
numbers in throughput mode (of course it depends on the controller
implementation, so not all the controllers might be affected).

The cbench code sleeps for 100ms to clear out buffers after reading
the switch counters (fakeswitch_get_count in fakeswitch.c). There are
two problems here:

* Switch input and output buffers are not cleared under throughput mode.
* Having X switches means that the code sleeps for 100X ms instead of
a single 100ms for all emulated switches.

These would result in a significant over-estimation of controller
performance under throughput mode if one is using more than a few
emulated switches. For instance, with 128 switches, cbench would sleep
for almost 13 seconds before printing out stats of each round,
meanwhile the controller fills the input buffer of all the emulated
switches. Since the input buffer is not cleared, the stats of the next
round would contain the replies received for requests in previous
rounds (which is a potentially large number).

Rob, I will post a patch soon. Meanwhile, a quick fix is to move the
sleep to an appropriate place in run_test (cbench.c) and clear the
buffers under throughput mode as well in fakeswitch_get_count
(fakeswitch.c).

Amin

> A problem with cbench might even be of interest to those who wrote it
> :-)  If I could bother you to just send me a diff of what you've
> changed, it would be much appreciated.  I can push it back into the
> main branch.
>
> Fwiw, cbench is something I wrote very quickly while jetlagged, so
> it's not surprising that there are bugs in it.  I didn't realize that
> people were actually using it, or I would try to snag some time to
> make it less crappy :-)
>
> Thanks for the feedback,
>
> - Rob

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


[nox-dev] NOX performance improvement by a factor 10

2010-12-14 Thread Amin Tootoonchian
[cross-posting to nox-dev, openflow-discuss, ovs-discuss]

I have prepared a patch based on NOX Zaku that improves its
performance by a factor of >10. This implies that a single controller
instance can run a large network with near a million flow initiations
per second. I am writing to open up a discussion and get feedback from
the community.

Here are some preliminary results:

- Benchmark configuration:
  * Benchmark: Throughput test of cbench (controller benchmarker) with
64 switches. Cbench is a part of the OFlops package
(http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
mode, cbench sends a batch of ofp_packet_in messages to the controller
and counts the number of replies it gets back.
  * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
quad-core Intel Xeon processor (X3210), and 4GB RAM
  * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz
quad-core Intel Xeon processor (E5405), and 4GB RAM
  * Connectivity: 1Gbps

- Benchmark results:
  * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core).
  * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8
available cores). The sustained controller->benchmarker throughput is
~400Mbps.

The patch updates the asynchronous harness of NOX to a standard
library (boost asynchronous I/O library) which simplifies the code
base. It fixes the code in several areas, including but not limited
to:

- Multi-threading: The patch enables having any number of worker
threads running on multiple cores.

- Batching: Serving requests individually and sending replies one by
one is quite inefficient. The patch tries to batch requests together
were possible, as well replies (which reduces the number of system
calls significantly).

- Memory allocation: The standard C++ memory allocator is not robust
in multi-threaded environments. Google's Thread-Caching Malloc
(TCMalloc) or Hoard memory allocator perform much better for NOX.

- Fully asynchronous operation: The patched version avoids wasting CPU
cycles polling sockets, or event/timer dispatchers when not necessary.

I would like to add that the patched version should perform much
better than what I reported above (the number reported is with a run
on 4 CPU cores). I guess a single NOX instance running on a machine
with 8 CPU cores should handle well above 1 million flow initiation
requests per second. Also having a more capable machine should help to
serve more requests! The code will be made available soon and I will
post updates as well.


Cheers,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


[nox-dev] Turning NOX's OpenFlow message dispatching into an application

2010-04-13 Thread Amin Tootoonchian
Hi all,

Currently, NOX implements the OpenFlow message (command) dispatching
functionality in a couple of functions builtin/nox.cc (e.g.,
send_openflow_command). I am wondering if it is possible to move this
functionality to an application: i.e., to send OpenFlow commands to
switches, applications fire events defined by the dispatcher
application instead of calling the functions found in builtin/nox.cc
instead. The dispatcher application handles such events and sends out
OpenFlow messages to switches. I understand that it is a radical
change, but it seems to be really beneficial:

* A policy enforcement application can be designed to discard the
commands which violate network policies.
* OpenFlow commands/messages can be prioritized.
* This enables transparent proxying/routing of OpenFlow commands among
multiple controllers.
* The overhead seems to be negligible.

- In particular, I need this feature for HyperFlow
(http://www.usenix.org/event/inmwren10/tech/full_papers/Tootoonchian.pdf).
I will ask about event serialization later :-)

What do you think?


Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


[nox-dev] Typical NOX Memory Usage

2010-03-29 Thread Amin Tootoonchian
Hi all,

For a research project, I need to know what is the typical memory
usage of a NOX controller in existing deployments. I am particularly
interested in average and maximum memory usage.

Btw, do you have any numbers about the number of times NOX crashed in
existing deployments because of memory leaks?



Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


[nox-dev] Average Size of an OpenFlow Message

2010-03-29 Thread Amin Tootoonchian
Hi all,

To evaluate a couple of OpenFlow-based systems I am working on, I need
to see how the control traffic of a real OpenFlow deployment looks
like. In particular, I am interested in:

* Minimum, average, and maximum OpenFlow message length. I guess I can
find hypothetical min and max by looking into the spec, but I'd really
like to know what these numbers are in existing deployments.
* Distribution of OpenFlow message types and byte distribution for
each type (especially flow initiation).
* Byte distribution of control traffic over time.

Rough numbers (min, max, mean, median) should work pretty fine for me.
Btw, I am really interested to know how you expect the control traffic
to look like in future.

Can anyone point me to any existing papers/datasets I can cite/use for
this purpose? My last question is that, are you going to make any
OpenFlow control traffic dataset available?


Thanks,
Amin

P.S.: I have cross-posted to openflow-discuss and nox-dev. Sorry for duplicates.

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] g++ in fedora

2010-02-18 Thread Amin Tootoonchian
Hi,

I think you are missing the python-twisted package. Install it and you
should be fine.


Cheers,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


[nox-dev] Order of event handlers

2010-02-16 Thread Amin Tootoonchian
Hi all,

Is there a way to specify a handler to be the last one receiving an
event (without enumerating all components in nox.xml)?


Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] Preparing for Nox 0.6.0

2009-11-04 Thread Amin Tootoonchian
Hi Martin,

I meant requiring the originators to add themselves (I can think of
workarounds to have it work implicitly, but they are
compiler/architecture specific).

This feature is not only useful for debugging, it can also be used to
deploy multiple NOX controllers to control a single network: On each
controller *capture a set of ofp_msg_events* (e.g., only a small
portion of packet_in events change the controller state) and
*replay/dispatch* them on the others. We need to discard any outgoing
ofp packets caused by the replayed events and for that we need to keep
track of what events triggered other events/messages.

Also, to find out which events are *important* (i.e., alter the
controller state), the controller and the running applications need to
mark events explicitly. In other words, it is the
controller/application developer's job to specify which events should
be propagated to other controllers. This part also requires the
feature mentioned above. That is because if a non-ofp_msg_event is
marked we should be able to trace back to the original ofp_msg_event
and mark it. Am I right about the ofp_msg_events being the driving
force of NOX operation?

And my last question: Is there currently anyway for two NOX instances
to synchronize their states for fail over? If not, are there any plans
to provide such a feature? Is there any way for a
controller/application to store its transient state on disk? What
happens in a production network with hundreds of switches when the
controller crashes and comes back up in a few seconds? Should it
rediscover the topology, host-ip-mac bindings, etc. from scratch?


Cheers,
Amin

> Regarding tracing the event call stack.  This would certainly be a useful
> debugging tool.  However, the nature of events is that the infrastructure is
> decoupled from the senders and receivers so it isn't clear to me how we'd
> mark the originator an a general way without requiring the originators to
> add themselves.  I'm certainly open to ideas ...


Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] Preparing for Nox 0.6.0

2009-11-04 Thread Amin Tootoonchian
Hi Martin,

Will the upcoming NOX release be OpenFlow 1.0 feature compatible and
not just OpenFlow 1.0 candidate wire protocol compatible?

One more question: It seems that it is currently impossible to trace
an event and a NOX generated OpenFlow message back to their originator
events (e.g., a Packet_in_event may cause other events to be
dispatched and some OpenFlow messages to be sent out). It would be
useful to have an entry in the Event and OpenFlow Message classes to
pinpoint their originators. It seems that this needs only a few
changes to the code base and enables NOX (& third-party applications)
to have a high level view of NOX operation. I am working on OpenFlow
control plane distribution and this feature would be quite useful for
that I believe.


Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org