Re: [nox-dev] dev/destiny-fast doesn't respond

2011-12-27 Thread Amin Tootoonchian
Looks like you are already seeing ~3M packet-ins per sec (3k per msec
= 3M per sec).

Amin

On Tue, Dec 27, 2011 at 2:05 PM, Volkan YAZICI volkan.yaz...@gmail.com wrote:
 Thanks David! You are right, removing tcmalloc for nox_core solved the
 problem.

 --8---cut here---start-8---
 $ dpkg -l | grep tcmalloc
 ii  libtcmalloc-mi 1.5-1          an efficient thread-caching malloc
 $ nox_core -i ptcp:6633 switch -l ~/usr/bin/nox -t 7
 $ cbench -c localhost -p 6633 -m 1 -l 10 -s 32 -M 100 -t
 cbench: controller benchmarking tool
   running in mode 'throughput'
   connecting to controller at localhost:6633
   faking 32 switches :: 10 tests each; 1 ms per test
   with 100 unique source MACs per switch
   starting test with 0 ms delay after features_reply
   ignoring first 1 warmup and last 0 cooldown loops
   debugging info is off
 32  switches: fmods/sec:  630907  ...  total = 1977.609403 per ms
 32  switches: fmods/sec:  799125  ...  total = 2558.905526 per ms
 32  switches: fmods/sec:  903720  ...  total = 2901.221645 per ms
 32  switches: fmods/sec:  900237  ...  total = 2868.801376 per ms
 32  switches: fmods/sec:  875842  ...  total = 2825.217623 per ms
 ...
 --8---cut here---end---8---

 This is a reasonably powerful machine, that is,

 --8---cut here---start-8---
 $ cat /etc/debian_version
 6.0.3
 $ uname -a
 Linux odun 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64 GNU/Linux
 $ grep ^processor /proc/cpuinfo | wc -l
 8
 $ grep ^model name /proc/cpuinfo | head -n 1
 model name      : Intel(R) Xeon(R) CPU           E5606  @ 2.13GHz
 --8---cut here---end---8---

 I still couldn't understand how do you get the results at million level
 in your comparisons. Am I missing something? What should I suspect? Can
 tcmalloc cause such a 1000x performance impact?


 Best.

 On Tue, 27 Dec 2011 10:44:39 -0800, David Erickson writes:
 What tcmalloc version do you have, and what OS?  Try launching without 
 tcmalloc,
 on some combinations NOX would just hang when a switch connects and you are
 using tcmalloc.
 ___
 nox-dev mailing list
 nox-dev@noxrepo.org
 http://noxrepo.org/mailman/listinfo/nox-dev
___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev


Re: [nox-dev] Error building dev/destiny-fast branch

2011-10-27 Thread Amin Tootoonchian
On Wed, Oct 26, 2011 at 7:32 PM, Andreas Voellmy
andreas.voel...@gmail.com wrote:
 On Wed, Oct 26, 2011 at 8:42 PM, Amin Tootoonchian a...@cs.toronto.edu
 wrote:

 I only updated the 'switch' app in that code base, and I never looked
 at 'hub'. My guess is that the hub app is doing so little that locking
 within boost::asio scheduler outweights the actual work done by the
 hub app. We need to make sure that the amount of work done by each
 thread upon its invocation is significantly more than the locking
 overhead in boost::asio's internal job queue.


 I'm unclear about how components in the destiny branch work. Do the handlers
 run concurrently by default, or is there something extra that one has to
 write to get them to execute concurrently? If something extra is needed,
 what is it in switch.cc that makes it execute concurrently? Or are you
 saying that the event handlers in 'hub' are indeed running concurrently, but
 they aren't doing enough work to get much performance gain? (By the way, I
 was looking at /src/nox/coreapps/switch/switch.cc
 and /src/nox/coreapps/hub/hub.cc)

 Thanks,
 Andreas

They run concurrently by default. They should be indeed running
concurrently, but I am guessing locking overhead within boost::asio
significantly outweights the actual work done by each thread. It
shouldn't be hard to fix, but not worth it since we consider that code
base to be just a proof of concept.

Thanks,
Amin


 Cheers,
 Amin

 P.S.: Btw, passing '--enable-ndebug' to configure should boost the
 performace.

 On Wed, Oct 26, 2011 at 2:08 PM, Andreas Voellmy
 andreas.voel...@gmail.com wrote:
  Thanks. The code compiled after configuring without python.
  I was able to get roughly the same kind of performance out of the
  'switch'
  application that is mentioned on the performance page
 
  (http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons).
  However, the 'hub' controller doesn't have much speedup when running
  with
  more threads. For example, when running with one thread I get a
  throughput
  of 213868.81 and when I run it with 8 threads I get a throughput
  of 264017.35. (To run with 8 threads, I am starting the controller like
  this: ./nox_core -i ptcp: hub -t 8; I am testing with cbench in
  throughput mode cbench -p  -t)
  Is this - that 'hub' gets not much speedup while 'switch' gets lots of
  speedup - expected with this branch of NOX? Is there something that
  needs to
  be done to hub in order to enable the framework to run it concurrently?
  Regards,
  Andreas
 
  On Wed, Oct 26, 2011 at 5:53 AM, Murphy McCauley jam...@nau.edu wrote:
 
  This branch is quite a bit behind the actual development.  We're
  preparing
  to release the updated codebase in the near future.
  But for one thing, Python doesn't work in it.  So you probably need to
  do
  --with-python=no when you run configure.
  Hope that helps.
  -- Murphy
  On Oct 25, 2011, at 8:49 PM, Andreas Voellmy wrote:
 
  Thanks. I tried editing the conflict marker out in a couple ways that
  seemed reasonable to me, but I got other compile errors. Does anyone
  know if
  there is a known working version of this branch in the repository, and
  how I
  can get back to it?
  Thanks,
  Andreas
 
  2011/10/25 Zoltán Lajos Kis zoltan.lajos@ericsson.com
 
  Seems like someone checked in a conflict marker to that file:
 
 
 
  http://noxrepo.org/cgi-bin/gitweb.cgi?p=nox;a=blob;f=src/nox/coreapps/pyrt/context.i;h=cb8641d72feb3a1f0543e97830a2addd55d502b9;hb=dev/destiny-fast#l83
 
  Z.
 
  
  From: nox-dev-boun...@noxrepo.org [nox-dev-boun...@noxrepo.org] On
  Behalf
  Of Andreas Voellmy [andreas.voel...@gmail.com]
  Sent: Wednesday, October 26, 2011 4:40 AM
  To: nox-dev@noxrepo.org
  Subject: [nox-dev] Error building dev/destiny-fast branch
 
  Hi,
 
  I'd like to try the destiny-fast branch (I saw it mentioned here:
 
  http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons),
   so
  I did the following
 
  git clone git://noxrepo.org/noxhttp://noxrepo.org/nox
  cd nox
  git checkout dev/destiny-fast
 
  Is that the right way to get this branch? After that I ran
  ./boot.sh
  mkdir build
  cd build
  ../configure
  make
 
  and got the following error:
 
  Making all in pyrt
  make[8]: Entering directory
  `/home/av/Download/nox-destiny/nox/build/src/nox/coreapps/pyrt'
  /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64
  -I../../../../../src/include/openflow -I../../../../../src/nox/lib/
  -outdir
  ./. -o oxidereactor_wrap.cc -module oxidereactor
  ../../../../../src/nox/coreapps/pyrt/oxidereactor.i
  /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64 -outdir ./. -o
  deferredcallback_wrap.cc -module deferredcallback
  ../../../../../src/nox/coreapps/pyrt/deferredcallback.i
  /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64
  -I../../../../../src/include/openflow -I../../../../../src/nox/lib/
  -outdir
  ./. -o pycomponent_wrap.cc -module pycomponent
  ../../../../../src

Re: [nox-dev] Error building dev/destiny-fast branch

2011-10-26 Thread Amin Tootoonchian
I only updated the 'switch' app in that code base, and I never looked
at 'hub'. My guess is that the hub app is doing so little that locking
within boost::asio scheduler outweights the actual work done by the
hub app. We need to make sure that the amount of work done by each
thread upon its invocation is significantly more than the locking
overhead in boost::asio's internal job queue.

If that is the case, since we are working on a new release, it doesn't
make much sense to fix it in that code base. Could you wait for that?

Cheers,
Amin

P.S.: Btw, passing '--enable-ndebug' to configure should boost the performace.

On Wed, Oct 26, 2011 at 2:08 PM, Andreas Voellmy
andreas.voel...@gmail.com wrote:
 Thanks. The code compiled after configuring without python.
 I was able to get roughly the same kind of performance out of the 'switch'
 application that is mentioned on the performance page
 (http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons).
 However, the 'hub' controller doesn't have much speedup when running with
 more threads. For example, when running with one thread I get a throughput
 of 213868.81 and when I run it with 8 threads I get a throughput
 of 264017.35. (To run with 8 threads, I am starting the controller like
 this: ./nox_core -i ptcp: hub -t 8; I am testing with cbench in
 throughput mode cbench -p  -t)
 Is this - that 'hub' gets not much speedup while 'switch' gets lots of
 speedup - expected with this branch of NOX? Is there something that needs to
 be done to hub in order to enable the framework to run it concurrently?
 Regards,
 Andreas

 On Wed, Oct 26, 2011 at 5:53 AM, Murphy McCauley jam...@nau.edu wrote:

 This branch is quite a bit behind the actual development.  We're preparing
 to release the updated codebase in the near future.
 But for one thing, Python doesn't work in it.  So you probably need to do
 --with-python=no when you run configure.
 Hope that helps.
 -- Murphy
 On Oct 25, 2011, at 8:49 PM, Andreas Voellmy wrote:

 Thanks. I tried editing the conflict marker out in a couple ways that
 seemed reasonable to me, but I got other compile errors. Does anyone know if
 there is a known working version of this branch in the repository, and how I
 can get back to it?
 Thanks,
 Andreas

 2011/10/25 Zoltán Lajos Kis zoltan.lajos@ericsson.com

 Seems like someone checked in a conflict marker to that file:


 http://noxrepo.org/cgi-bin/gitweb.cgi?p=nox;a=blob;f=src/nox/coreapps/pyrt/context.i;h=cb8641d72feb3a1f0543e97830a2addd55d502b9;hb=dev/destiny-fast#l83

 Z.

 
 From: nox-dev-boun...@noxrepo.org [nox-dev-boun...@noxrepo.org] On Behalf
 Of Andreas Voellmy [andreas.voel...@gmail.com]
 Sent: Wednesday, October 26, 2011 4:40 AM
 To: nox-dev@noxrepo.org
 Subject: [nox-dev] Error building dev/destiny-fast branch

 Hi,

 I'd like to try the destiny-fast branch (I saw it mentioned here:
 http://www.openflow.org/wk/index.php/Controller_Performance_Comparisons), so
 I did the following

 git clone git://noxrepo.org/noxhttp://noxrepo.org/nox
 cd nox
 git checkout dev/destiny-fast

 Is that the right way to get this branch? After that I ran
 ./boot.sh
 mkdir build
 cd build
 ../configure
 make

 and got the following error:

 Making all in pyrt
 make[8]: Entering directory
 `/home/av/Download/nox-destiny/nox/build/src/nox/coreapps/pyrt'
 /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64
 -I../../../../../src/include/openflow -I../../../../../src/nox/lib/ -outdir
 ./. -o oxidereactor_wrap.cc -module oxidereactor
 ../../../../../src/nox/coreapps/pyrt/oxidereactor.i
 /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64 -outdir ./. -o
 deferredcallback_wrap.cc -module deferredcallback
 ../../../../../src/nox/coreapps/pyrt/deferredcallback.i
 /usr/bin/swig -c++ -python  -DSWIGWORDSIZE64
 -I../../../../../src/include/openflow -I../../../../../src/nox/lib/ -outdir
 ./. -o pycomponent_wrap.cc -module pycomponent
 ../../../../../src/nox/coreapps/pyrt/component.i
 ../../../../../src/nox/coreapps/pyrt/context.i:79: Error: Syntax error in
 input(3).
 make[8]: *** [pycomponent.py] Error 1

 Does anyone know what went wrong and how to fix this?

 Thanks,
 Andreas


 ___
 nox-dev mailing list
 nox-dev@noxrepo.org
 http://noxrepo.org/mailman/listinfo/nox-dev



 ___
 nox-dev mailing list
 nox-dev@noxrepo.org
 http://noxrepo.org/mailman/listinfo/nox-dev


___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev


Re: [nox-dev] NOX Zaku with OF 1.1 support (C++ only)

2011-05-12 Thread Amin Tootoonchian
That would be great! I will be able to work on it again in two weeks I
guess. Just a couple of quick notes:

* So far I have only ported the switch app. Porting is super easy for
most apps: just need to add a boost mutex to protect the data
structure.
* I think some apps should be rewritten with performance in mind.
* The most important missing application is discovery which you have
already ported to C++.
* dev/destiny-fast branch needs testing. It has been mostly used in
different benchmarks and there are parts of the system which I never
tested after rewriting (e.g., SSL)!

Amin

 Having a C++-only fork of Nox is long overdue.  There are many Nox 
 developers who have expressed interest in this.  Off hand, I would suggest 
 that we pull in Amin's changes (which makes Nox blazingly fast) and remove 
 most spurious apps.
___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev


Re: [nox-dev] Maestro: a new scalable OpenFlow controller

2011-01-06 Thread Amin Tootoonchian
Hi Zheng,

 Sorry for the delay, and thanks a lot for all your comments. First I want to
 clarify that the performance number we measured in the tech report is for
 the routing application. Because the routing application needs to generate
 multiple flow configuration messages for all switches along the path one
 packet is going to take, its performance is considered to be worse than that
 of the switch application. We also use our simulator (which supports LLDP
 packets exchanging between neighbor switches so that the routing application
 could work) to evaluate the switch application of NOX, and get a
 throughput of 90K rps on a 3.1Ghz machine. But for the routing
 application, even after I turn on the ndebug option (by running ./configure
 --enable-ndebug and make, and I hope this is the right way to do it), the
 throughput is still around 20K rps. All the numbers in the tech report are
 for the routing application. I hope this makes sense to you.

Thanks for your reply. Using the routing application explains some of
your observation. However, I think benchmarks should be based on the
switch application or a no-op application to illustrate the overhead
of the controller itself (i.e., the underlying framework) as the
baseline. I think the reason for routing application performing badly
here is due to binding lookups (this is just a guess) and for sure it
could be tuned to perform significantly better than 20Krps.

 About the latency, we measure it in an end-to-end style. That is, we measure
 the start time-stamp for a request before we call the socket.send
 function, and measure the end time-stamp after we receive the
 flow_mod/packet_out for that particular request. I agree that NOX will
 throttle the connection and the latency of a request within NOX is very
 small. It's just the latency we measure is the end-to-end delay, which
 includes the queuing delay in both the sending buffer and receiving buffer.

Regarding latency measurements, it is tricky to measure stuff
precisely at a sub-millisecond resolution. One should be very careful
with the resolution of timers and avoid being affected by the
operating system scheduler. If you are seeing latencies close to the
scheduler quantum size (under Linux typically 10ms) your measurement
might be affected by the scheduler. There is no real workaround for
this under a non-realtime kernel; however, running the controller,
benchmarker, and the latency measurement tool with CPU affinity set
(non-overlapping CPU sets -- using taskset -c X) and with a
high-priority FIFO scheduler (chrt -f X under Linux) relieves many of
these side effects. Btw, I was talking about the end-to-end latency
previously. It is quite possible to reduce that

 Furthermore, I would like to know that is the code of HyperFlow, or the NOX
 multi-threading patch you mentioned, going to be available? Because I really
 want to study the difference of all the existing systems, and hopefully my
 effort could contribute to this community :)

I think we could make the NOX multi-threading patch publicly available
soon as a branch on noxrepo. The code needs to be reviewed and tested
by the NOX team and the community before making its way to the
mainstream NOX. I, for sure, appreciate your efforts very much. The
only thing is that if we had a framework to compare controller
performance we could better understand the trade-offs in controller
design. Rob's oflops/cbench package is a great start, and we could all
contribute to it.

Cheers,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
As Martin said, in some cases cbench may significantly over-report
numbers in throughput mode (of course it depends on the controller
implementation, so not all the controllers might be affected).

The cbench code sleeps for 100ms to clear out buffers after reading
the switch counters (fakeswitch_get_count in fakeswitch.c). There are
two problems here:

* Switch input and output buffers are not cleared under throughput mode.
* Having X switches means that the code sleeps for 100X ms instead of
a single 100ms for all emulated switches.

These would result in a significant over-estimation of controller
performance under throughput mode if one is using more than a few
emulated switches. For instance, with 128 switches, cbench would sleep
for almost 13 seconds before printing out stats of each round,
meanwhile the controller fills the input buffer of all the emulated
switches. Since the input buffer is not cleared, the stats of the next
round would contain the replies received for requests in previous
rounds (which is a potentially large number).

Rob, I will post a patch soon. Meanwhile, a quick fix is to move the
sleep to an appropriate place in run_test (cbench.c) and clear the
buffers under throughput mode as well in fakeswitch_get_count
(fakeswitch.c).

Amin

 A problem with cbench might even be of interest to those who wrote it
 :-)  If I could bother you to just send me a diff of what you've
 changed, it would be much appreciated.  I can push it back into the
 main branch.

 Fwiw, cbench is something I wrote very quickly while jetlagged, so
 it's not surprising that there are bugs in it.  I didn't realize that
 people were actually using it, or I would try to snag some time to
 make it less crappy :-)

 Thanks for the feedback,

 - Rob

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
I double checked. It does slightly improve the performance (in the
order of a few thousand replies/sec). Larger MTUs decrease the CPU
workload (by decreasing the number of transfers across the bus) and
this means that more CPU cycles are available to the controller to
process requests. However, I am not suggesting that people should use
jumbo frames. Apparently running with more user-space threads does the
trick here. Anyway, I should trust a profiler rather than guessing, so
I will get back with a definite answer once I have done a more
thorough evaluation.

Cheers,
Amin

On Wed, Dec 15, 2010 at 2:51 PM, kk yap yap...@stanford.edu wrote:
 Random curiosity: Why would jumbo frames increases replies per sec?

 Regards
 KK

 On 15 December 2010 11:45, Amin Tootoonchian a...@cs.toronto.edu wrote:
 I missed that. The single core throughput is ~250k replies/sec, two
 cores ~450k replies/sec, three cores ~650k replies/sec, four cores
 ~800 replies/sec. These numbers are higher than what I reported in my
 previous post. That is most probably because, right now, I am testing
 with MTU 9000 (jumbo frames) and with more user-space threads.

 Cheers,
 Amin

 On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado cas...@nicira.com wrote:
 Also, do you mind posting the single core throughput?

 [cross-posting to nox-dev, openflow-discuss, ovs-discuss]

 I have prepared a patch based on NOX Zaku that improves its
 performance by a factor of10. This implies that a single controller
 instance can run a large network with near a million flow initiations
 per second. I am writing to open up a discussion and get feedback from
 the community.

 Here are some preliminary results:

 - Benchmark configuration:
   * Benchmark: Throughput test of cbench (controller benchmarker) with
 64 switches. Cbench is a part of the OFlops package
 (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
 mode, cbench sends a batch of ofp_packet_in messages to the controller
 and counts the number of replies it gets back.
   * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
 quad-core Intel Xeon processor (X3210), and 4GB RAM
   * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz
 quad-core Intel Xeon processor (E5405), and 4GB RAM
   * Connectivity: 1Gbps

 - Benchmark results:
   * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core).
   * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8
 available cores). The sustained controller-benchmarker throughput is
 ~400Mbps.

 The patch updates the asynchronous harness of NOX to a standard
 library (boost asynchronous I/O library) which simplifies the code
 base. It fixes the code in several areas, including but not limited
 to:

 - Multi-threading: The patch enables having any number of worker
 threads running on multiple cores.

 - Batching: Serving requests individually and sending replies one by
 one is quite inefficient. The patch tries to batch requests together
 were possible, as well replies (which reduces the number of system
 calls significantly).

 - Memory allocation: The standard C++ memory allocator is not robust
 in multi-threaded environments. Google's Thread-Caching Malloc
 (TCMalloc) or Hoard memory allocator perform much better for NOX.

 - Fully asynchronous operation: The patched version avoids wasting CPU
 cycles polling sockets, or event/timer dispatchers when not necessary.

 I would like to add that the patched version should perform much
 better than what I reported above (the number reported is with a run
 on 4 CPU cores). I guess a single NOX instance running on a machine
 with 8 CPU cores should handle well above 1 million flow initiation
 requests per second. Also having a more capable machine should help to
 serve more requests! The code will be made available soon and I will
 post updates as well.


 Cheers,
 Amin
 ___
 openflow-discuss mailing list
 openflow-disc...@lists.stanford.edu
 https://mailman.stanford.edu/mailman/listinfo/openflow-discuss



 ___
 nox-dev mailing list
 nox-dev@noxrepo.org
 http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org



___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
I am talking about jumbo Ethernet frames here. By batching, I mean
batching outgoing messages together and writing to the underlying
layer which would be the TCP write buffer. The TCP buffer is not
limited to MTU or anything like that, so in most cases my code flushes
more than 64KB to the TCP write buffer. The gain is due to issuing a
single system call with a larger buffer rather than many system calls
with tiny buffers (e.g., 128 bytes you mentioned).

I do not sacrifice delay for throughput here. I keep a write buffer
and keep appending to it until the underlying socket is ready for
writes. Once it is ready for a write operation, buffered replies are
flush to the underlying layer immediately. This is quite different
than Nagle's algorithm and will not add any delays.

Amin

On Wed, Dec 15, 2010 at 3:47 PM, kk yap yap...@stanford.edu wrote:
 Oh.. another point, if you are batching the frames, then what about
 delay?  There seems to be a trade-off between delay and throughput,
 and we have went for the former by disabling Nagle's algorithm.

 Regards
 KK

 On 15 December 2010 12:46, kk yap yap...@stanford.edu wrote:
 Hi Amin,

 Just to clarify, does your jumbo frames refer to the OpenFlow messages
 or the frames in the datapath?   By OpenFlow messages, I am assuming
 you use a TCP connection between NOX and the switches, and you are
 batching the messages into jumbo frames of 9000 bytes before sending
 them out.  By frames in the datapath, I mean jumbo Ethernet frames are
 being sent in the datapath.  The latter does not make any sense to me,
 because OpenFlow should send 128 bytes to the controller by default.

 Thanks.

 Regards
 KK

 On 15 December 2010 12:36, Amin Tootoonchian a...@cs.toronto.edu wrote:
 I double checked. It does slightly improve the performance (in the
 order of a few thousand replies/sec). Larger MTUs decrease the CPU
 workload (by decreasing the number of transfers across the bus) and
 this means that more CPU cycles are available to the controller to
 process requests. However, I am not suggesting that people should use
 jumbo frames. Apparently running with more user-space threads does the
 trick here. Anyway, I should trust a profiler rather than guessing, so
 I will get back with a definite answer once I have done a more
 thorough evaluation.

 Cheers,
 Amin

 On Wed, Dec 15, 2010 at 2:51 PM, kk yap yap...@stanford.edu wrote:
 Random curiosity: Why would jumbo frames increases replies per sec?

 Regards
 KK

 On 15 December 2010 11:45, Amin Tootoonchian a...@cs.toronto.edu wrote:
 I missed that. The single core throughput is ~250k replies/sec, two
 cores ~450k replies/sec, three cores ~650k replies/sec, four cores
 ~800 replies/sec. These numbers are higher than what I reported in my
 previous post. That is most probably because, right now, I am testing
 with MTU 9000 (jumbo frames) and with more user-space threads.

 Cheers,
 Amin

 On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado cas...@nicira.com wrote:
 Also, do you mind posting the single core throughput?

 [cross-posting to nox-dev, openflow-discuss, ovs-discuss]

 I have prepared a patch based on NOX Zaku that improves its
 performance by a factor of10. This implies that a single controller
 instance can run a large network with near a million flow initiations
 per second. I am writing to open up a discussion and get feedback from
 the community.

 Here are some preliminary results:

 - Benchmark configuration:
   * Benchmark: Throughput test of cbench (controller benchmarker) with
 64 switches. Cbench is a part of the OFlops package
 (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
 mode, cbench sends a batch of ofp_packet_in messages to the controller
 and counts the number of replies it gets back.
   * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
 quad-core Intel Xeon processor (X3210), and 4GB RAM
   * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz
 quad-core Intel Xeon processor (E5405), and 4GB RAM
   * Connectivity: 1Gbps

 - Benchmark results:
   * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core).
   * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8
 available cores). The sustained controller-benchmarker throughput is
 ~400Mbps.

 The patch updates the asynchronous harness of NOX to a standard
 library (boost asynchronous I/O library) which simplifies the code
 base. It fixes the code in several areas, including but not limited
 to:

 - Multi-threading: The patch enables having any number of worker
 threads running on multiple cores.

 - Batching: Serving requests individually and sending replies one by
 one is quite inefficient. The patch tries to batch requests together
 were possible, as well replies (which reduces the number of system
 calls significantly).

 - Memory allocation: The standard C++ memory allocator is not robust
 in multi-threaded environments. Google's Thread-Caching Malloc
 (TCMalloc) or Hoard memory

[nox-dev] NOX performance improvement by a factor 10

2010-12-14 Thread Amin Tootoonchian
[cross-posting to nox-dev, openflow-discuss, ovs-discuss]

I have prepared a patch based on NOX Zaku that improves its
performance by a factor of 10. This implies that a single controller
instance can run a large network with near a million flow initiations
per second. I am writing to open up a discussion and get feedback from
the community.

Here are some preliminary results:

- Benchmark configuration:
  * Benchmark: Throughput test of cbench (controller benchmarker) with
64 switches. Cbench is a part of the OFlops package
(http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput
mode, cbench sends a batch of ofp_packet_in messages to the controller
and counts the number of replies it gets back.
  * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz
quad-core Intel Xeon processor (X3210), and 4GB RAM
  * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz
quad-core Intel Xeon processor (E5405), and 4GB RAM
  * Connectivity: 1Gbps

- Benchmark results:
  * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core).
  * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8
available cores). The sustained controller-benchmarker throughput is
~400Mbps.

The patch updates the asynchronous harness of NOX to a standard
library (boost asynchronous I/O library) which simplifies the code
base. It fixes the code in several areas, including but not limited
to:

- Multi-threading: The patch enables having any number of worker
threads running on multiple cores.

- Batching: Serving requests individually and sending replies one by
one is quite inefficient. The patch tries to batch requests together
were possible, as well replies (which reduces the number of system
calls significantly).

- Memory allocation: The standard C++ memory allocator is not robust
in multi-threaded environments. Google's Thread-Caching Malloc
(TCMalloc) or Hoard memory allocator perform much better for NOX.

- Fully asynchronous operation: The patched version avoids wasting CPU
cycles polling sockets, or event/timer dispatchers when not necessary.

I would like to add that the patched version should perform much
better than what I reported above (the number reported is with a run
on 4 CPU cores). I guess a single NOX instance running on a machine
with 8 CPU cores should handle well above 1 million flow initiation
requests per second. Also having a more capable machine should help to
serve more requests! The code will be made available soon and I will
post updates as well.


Cheers,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


[nox-dev] Typical NOX Memory Usage

2010-03-29 Thread Amin Tootoonchian
Hi all,

For a research project, I need to know what is the typical memory
usage of a NOX controller in existing deployments. I am particularly
interested in average and maximum memory usage.

Btw, do you have any numbers about the number of times NOX crashed in
existing deployments because of memory leaks?



Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


[nox-dev] Order of event handlers

2010-02-16 Thread Amin Tootoonchian
Hi all,

Is there a way to specify a handler to be the last one receiving an
event (without enumerating all components in nox.xml)?


Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org


Re: [nox-dev] Preparing for Nox 0.6.0

2009-11-04 Thread Amin Tootoonchian
Hi Martin,

I meant requiring the originators to add themselves (I can think of
workarounds to have it work implicitly, but they are
compiler/architecture specific).

This feature is not only useful for debugging, it can also be used to
deploy multiple NOX controllers to control a single network: On each
controller *capture a set of ofp_msg_events* (e.g., only a small
portion of packet_in events change the controller state) and
*replay/dispatch* them on the others. We need to discard any outgoing
ofp packets caused by the replayed events and for that we need to keep
track of what events triggered other events/messages.

Also, to find out which events are *important* (i.e., alter the
controller state), the controller and the running applications need to
mark events explicitly. In other words, it is the
controller/application developer's job to specify which events should
be propagated to other controllers. This part also requires the
feature mentioned above. That is because if a non-ofp_msg_event is
marked we should be able to trace back to the original ofp_msg_event
and mark it. Am I right about the ofp_msg_events being the driving
force of NOX operation?

And my last question: Is there currently anyway for two NOX instances
to synchronize their states for fail over? If not, are there any plans
to provide such a feature? Is there any way for a
controller/application to store its transient state on disk? What
happens in a production network with hundreds of switches when the
controller crashes and comes back up in a few seconds? Should it
rediscover the topology, host-ip-mac bindings, etc. from scratch?


Cheers,
Amin

 Regarding tracing the event call stack.  This would certainly be a useful
 debugging tool.  However, the nature of events is that the infrastructure is
 decoupled from the senders and receivers so it isn't clear to me how we'd
 mark the originator an a general way without requiring the originators to
 add themselves.  I'm certainly open to ideas ...


Thanks,
Amin

___
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org