On Wed, Jul 04, 2018 at 07:25:50PM -0700, William Tu wrote: > On Tue, Jul 3, 2018 at 10:56 AM, Alexei Starovoitov > <alexei.starovoi...@gmail.com> wrote: > > On Thu, Jun 28, 2018 at 07:19:35AM -0700, William Tu wrote: > >> Hi Alexei, > >> > >> Thanks a lot for the feedback! > >> > >> On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov > >> <alexei.starovoi...@gmail.com> wrote: > >> > On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote: > >> >> > >> >> Discussion > >> >> ========== > >> >> We are still actively working on finishing the feature, currently > >> >> the basic forwarding and tunnel feature work, but still under > >> >> heavy debugging and development. The purpose of this RFC is to > >> >> get some early feedbacks and direction for finishing the complete > >> >> features in existing kernel's OVS datapath (the net/openvswitch/*). > >> > > >> > Thank you for sharing the patches. > >> > > >> >> Three major issues we are worried: > >> >> a. Megaflow support in BPF. > >> >> b. Connection Tracking support in BPF. > >> > > >> > my opinion on the above two didn't change. > >> > To recap: > >> > A. Non scalable megaflow map is no go. I'd like to see packet > >> > classification > >> > algorithm like hicuts or efficuts to be implemented instead, since it > >> > can be > >> > shared by generic bpf, bpftiler, ovs and likely others. > >> > >> We did try the decision tree approach using dpdk's acl lib. The lookup > >> speed is 6 times faster than the magaflow using tuple space. > >> However, the update/insertion requires rebuilding/re-balancing the decision > >> tree so it's way too slow. I think hicuts or efficuts suffers the same > >> issue. > >> So decision tree algos are scalable only for lookup operation due to its > >> optimization over tree depth, but not scalable under > >> update/insert/delete operations. > >> > >> On customer's system we see megaflow update/insert rate around 10 > >> rules/sec, > >> this makes decision tree unusable, unless we invent something to optimize > >> the > >> update/insert time or incremental update of these decision tree algo. > > > > is this a typo? you probably meant 10K rule updates a second ? > I mean "new" rules being added at 10 rules/sec. > Update rate might be much higher. > > > Last time I've dealt with these algorithms we had 100K acl updates a second. > > It was an important metric that we were optimizing for. > > I'm pretty sure '*cuts' algos do many thousands per second non optimized. > > When adding a new rule, do these algorithms require rebuilding the > entire tree? > > In our evaluation, updating an existing entry in the decision tree > performs OK, because it is equal to lookup and replace, and lookup > is fast, update is just atomic swap. But inserting a new rule is slow, > because it requires re-building the tree using all existing rules. > And we see new rule being added at rate 10 rules per second. > So we are constantly rebuilding the entire tree. > > If the entire tree has 100k of rules, it takes around 2 seconds to rebuild, > based on the dpdk acl library. And without an incremental algorithm, > adding 1 new rule will trigger rebuilding the 100k of rules, and it is too > slow. > > Reading through HyperCuts and EffiCuts, I'm not sure how it supports > incrementally adding a new rule, without rebuilding the entire tree. > http://ccr.sigcomm.org/online/files/p207.pdf > http://cseweb.ucsd.edu/~susingh/papers/hyp-sigcomm03.pdf > > The HyperCuts papers says > "A fast update algorithm can also be implemented; however we do not > go into the details of incremental update in this paper" > > > > >> >> c. Verifier limitation. > >> > > >> > Not sure what limitations you're concerned about. > >> > > >> > >> Mostly related to stack. The flow key OVS uses (struct sw_flow_key) > >> is 464 byte. We trim a lot, now around 300 byte, but still huge, > >> considering > >> the BPF's stack limit is 512 byte. > > > > have you tried using per-cpu array of one element with large value > > instead of stack? > > yes, now we store the flow key in percpu array with 1 element. > > > In the latest verifier most of the operations that can be done with the > > stack > > pointer can be done with pointer to map value too. > > > Once the flow key is stored in map, another eBPF program > needs to use that key to lookup flow table (another map). > So we have to store the flow key on stack first, in order to > use it as key to lookup the flow table map. > > Is there a way to work around it?
d71962f ("bpf: allow map helpers access to map values directly") removes that limitation from the verifier and should allow you to use map values as map keys directly. 4.18-rc1 has it. > Thanks > William _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev