On 8/25/20 7:46 PM, Ben Pfaff wrote:
> On Tue, Aug 25, 2020 at 06:43:51PM +0200, Dumitru Ceara wrote:
>> On 8/25/20 6:01 PM, Ben Pfaff wrote:
>>> On Mon, Aug 24, 2020 at 04:28:22PM -0700, Han Zhou wrote:
>>>> As I remember you were working on the new ovn-northd that utilizes DDlog
>>>> for incremental processing. Could you share the current status?
>>>>
>>>> Now that some more improvements have been made in ovn-controller and OVSDB,
>>>> the ovn-northd becomes the more obvious bottleneck for OVN use in large
>>>> scale environments. Since you were not in the OVN meetings for the last
>>>> couple of weeks, could you share here the status and plan moving forward?
>>>
>>> The status is basically that I haven't yet succeeded at getting Red
>>> Hat's recommended benchmarks running.  I'm told that is important before
>>> we merge it.  I find them super difficult to set up.  I tried a few
>>> weeks ago and basically gave up.  Piles and piles of repos all linked
>>> together in tricky ways, making it really difficult to substitute my own
>>> branches.  I intend to try again soon, though.  I have a new computer
>>> that should be arriving soon, which should also allow it to proceed more
>>> quickly.
>>
>> Hi Ben,
>>
>> I can try to help with setting up ovn-heater, in theory it should be
>> enough to export OVS_REPO, OVS_BRANCH, OVN_REPO, OVN_BRANCH, make them
>> point to your repos and branches and then run "do.sh install" and it
>> should take care of installing all the dependencies and repos.
>>
>> I can also try to run the scale tests on our downstream if that helps.
> 
> It's probably better if I come up with something locally, because I
> expect to have to run it multiple times, maybe many times, since I will
> presumably discover bottlenecks.
> 
> This time around, I'll speak up when I run into problems.
> 

Sorry in advance for the log email.

I went ahead and added a new test scenario to ovn-heater that I think
might be relevant in the context of ovn-northd incremental processing:

https://github.com/dceara/ovn-heater#example-run-scenario-3---scale-up-number-of-pods---stress-ovn-northd

On my test machine:
Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
2 NUMA nodes - 28 cores each.

I did:

$ cd
$ git clone https://github.com/dceara/ovn-heater
$ cd ovn-heater
$ cat > physical-deployments/physical-deployment.yml << EOF
registry-node: localhost
internal-iface: none

central-node:
  name: localhost

worker-nodes:
  - localhost
EOF

# Install all the required repos and make everything work together using
# latest OVS and OVN code from github. This generates the
# ~/ovn-heater/runtime where all the repos are cloned and the test suite
# is run. This step also generates the container image with OVS/OVN
# compiled from sources. This step has to be done every time we need
# to test with a different version of OVS/OVN and can be customized with
# the OVS/OVN_REPO and OVS/OVN_BRANCH env vars.
$ ./do.sh install

# Start the test:
# This brings up 30 "fake" OVN nodes and then simulates addition of
# 1000 pods (lsps) and associated policies (port_group/address_set/acl).
$ ./do.sh browbeat-run
browbeat-scenarios/switch-per-node-30-node-1000-pods.yml debug-dceara-pods

# This takes quite long, ~1hr on my system.
# Results are stored at:
# ls -l
~/ovn-heater/test_results/debug-dceara-pods-20200826-080650/20200826-120718/rally/plugin-workloads/all-rally-run-0.html

What I noticed was that while the test was running (we can monitor the
execution by tailing ~/ovn-heater/runtime/browbeat/*.log) that
ovn-northd's CPU usage increased constantly and was above 70-80% after
~500 iterations.

ovn-northd logs:
2020-08-26T14:24:25.989Z|02119|poll_loop|INFO|wakeup due to [POLLIN] on
fd 12 (192.16.0.1:53642<->192.16.0.1:6642) at lib/stream-ssl.c:832 (97%
CPU usage)

2020-08-26T14:24:31.985Z|02120|poll_loop|INFO|Dropped 54 log messages in
last 5 seconds (most recently, 0 seconds ago) due to excessive rate


2020-08-26T14:24:31.985Z|02121|poll_loop|INFO|wakeup due to [POLLIN] on
fd 11 (192.16.0.1:56340<->192.16.0.1:6641) at lib/stream-ssl.c:832 (99%
CPU usage)

For troubleshooting/profiling, the easiest way I can think of for
rerunning the sequence of commands without actually running the whole
suite is to extract them from the ovn-nbctl daemon logs. We start it on
node ovn-central-1. I also added a short sleep to avoid NB changes being
batched before ovn-northd processes them:

$ docker exec ovn-central-1 grep "Running command"
/var/log/openvswitch/ovn-nbctl.log | sed -ne 's/.*Running command
run\(.*\)/ovn-nbctl\1; sleep 0.01/p' > commands.sh

# Now we can just run ovn-northd locally:
$ ovn-ctl start_northd
# Start an ovn-nbctl daemon locally:
$ export OVN_NB_DAEMON=$(ovn-nbctl --detach)
# Replay the commands:
$ ./commands.sh

Regarding the ddlog compilation I suspect that we need to add support
for it in ovn-fake-multinode which builds and runs the fake node's
images. I can take care of that and add the rust compiler and ddlog
binaries to the docker files.

I assume these are the branches I should use for testing:
https://github.com/blp/ovs-reviews/tree/ovs-for-ddlog
https://github.com/blp/ovs-reviews/tree/ddlog4

Hope this helps.

Regards,
Dumitru

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to