On 8/25/20 7:46 PM, Ben Pfaff wrote: > On Tue, Aug 25, 2020 at 06:43:51PM +0200, Dumitru Ceara wrote: >> On 8/25/20 6:01 PM, Ben Pfaff wrote: >>> On Mon, Aug 24, 2020 at 04:28:22PM -0700, Han Zhou wrote: >>>> As I remember you were working on the new ovn-northd that utilizes DDlog >>>> for incremental processing. Could you share the current status? >>>> >>>> Now that some more improvements have been made in ovn-controller and OVSDB, >>>> the ovn-northd becomes the more obvious bottleneck for OVN use in large >>>> scale environments. Since you were not in the OVN meetings for the last >>>> couple of weeks, could you share here the status and plan moving forward? >>> >>> The status is basically that I haven't yet succeeded at getting Red >>> Hat's recommended benchmarks running. I'm told that is important before >>> we merge it. I find them super difficult to set up. I tried a few >>> weeks ago and basically gave up. Piles and piles of repos all linked >>> together in tricky ways, making it really difficult to substitute my own >>> branches. I intend to try again soon, though. I have a new computer >>> that should be arriving soon, which should also allow it to proceed more >>> quickly. >> >> Hi Ben, >> >> I can try to help with setting up ovn-heater, in theory it should be >> enough to export OVS_REPO, OVS_BRANCH, OVN_REPO, OVN_BRANCH, make them >> point to your repos and branches and then run "do.sh install" and it >> should take care of installing all the dependencies and repos. >> >> I can also try to run the scale tests on our downstream if that helps. > > It's probably better if I come up with something locally, because I > expect to have to run it multiple times, maybe many times, since I will > presumably discover bottlenecks. > > This time around, I'll speak up when I run into problems. >
Sorry in advance for the log email. I went ahead and added a new test scenario to ovn-heater that I think might be relevant in the context of ovn-northd incremental processing: https://github.com/dceara/ovn-heater#example-run-scenario-3---scale-up-number-of-pods---stress-ovn-northd On my test machine: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 NUMA nodes - 28 cores each. I did: $ cd $ git clone https://github.com/dceara/ovn-heater $ cd ovn-heater $ cat > physical-deployments/physical-deployment.yml << EOF registry-node: localhost internal-iface: none central-node: name: localhost worker-nodes: - localhost EOF # Install all the required repos and make everything work together using # latest OVS and OVN code from github. This generates the # ~/ovn-heater/runtime where all the repos are cloned and the test suite # is run. This step also generates the container image with OVS/OVN # compiled from sources. This step has to be done every time we need # to test with a different version of OVS/OVN and can be customized with # the OVS/OVN_REPO and OVS/OVN_BRANCH env vars. $ ./do.sh install # Start the test: # This brings up 30 "fake" OVN nodes and then simulates addition of # 1000 pods (lsps) and associated policies (port_group/address_set/acl). $ ./do.sh browbeat-run browbeat-scenarios/switch-per-node-30-node-1000-pods.yml debug-dceara-pods # This takes quite long, ~1hr on my system. # Results are stored at: # ls -l ~/ovn-heater/test_results/debug-dceara-pods-20200826-080650/20200826-120718/rally/plugin-workloads/all-rally-run-0.html What I noticed was that while the test was running (we can monitor the execution by tailing ~/ovn-heater/runtime/browbeat/*.log) that ovn-northd's CPU usage increased constantly and was above 70-80% after ~500 iterations. ovn-northd logs: 2020-08-26T14:24:25.989Z|02119|poll_loop|INFO|wakeup due to [POLLIN] on fd 12 (192.16.0.1:53642<->192.16.0.1:6642) at lib/stream-ssl.c:832 (97% CPU usage) 2020-08-26T14:24:31.985Z|02120|poll_loop|INFO|Dropped 54 log messages in last 5 seconds (most recently, 0 seconds ago) due to excessive rate 2020-08-26T14:24:31.985Z|02121|poll_loop|INFO|wakeup due to [POLLIN] on fd 11 (192.16.0.1:56340<->192.16.0.1:6641) at lib/stream-ssl.c:832 (99% CPU usage) For troubleshooting/profiling, the easiest way I can think of for rerunning the sequence of commands without actually running the whole suite is to extract them from the ovn-nbctl daemon logs. We start it on node ovn-central-1. I also added a short sleep to avoid NB changes being batched before ovn-northd processes them: $ docker exec ovn-central-1 grep "Running command" /var/log/openvswitch/ovn-nbctl.log | sed -ne 's/.*Running command run\(.*\)/ovn-nbctl\1; sleep 0.01/p' > commands.sh # Now we can just run ovn-northd locally: $ ovn-ctl start_northd # Start an ovn-nbctl daemon locally: $ export OVN_NB_DAEMON=$(ovn-nbctl --detach) # Replay the commands: $ ./commands.sh Regarding the ddlog compilation I suspect that we need to add support for it in ovn-fake-multinode which builds and runs the fake node's images. I can take care of that and add the rust compiler and ddlog binaries to the docker files. I assume these are the branches I should use for testing: https://github.com/blp/ovs-reviews/tree/ovs-for-ddlog https://github.com/blp/ovs-reviews/tree/ddlog4 Hope this helps. Regards, Dumitru _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss