Re: [ovs-dev] [PATCH v2] treewide: Use packet batch APIs
On Tue, Sep 03, 2019 at 06:37PM +, William Tu wrote: > On Sun, Sep 01, 2019 at 03:10:05PM +0200, Paul Chaignon wrote: > > This patch replaces direct accesses to dp_packet_batch and dp_packet > > internal components by the appropriate API calls. It extends commit > > 1270b6e52 (treewide: Wider use of packet batch APIs). > > > > This patch was generated using the following semantic patch (cf. > > http://coccinelle.lip6.fr). > > Looks very interesting, I spent some time learning it. Glad you find it interesting! I'm actually trying to practice using it myself. (If you have any usage ideas for Open vSwitch, I'm interested; I already went through all the treewide patches.) > If you have time, can you show us how to run it? > I installed coccinelle on ubuntu and I can run on linux kernel > make coccicheck MODE=report > but for OVS, do we provide the semantic patch below and run > it manually? I executed the semantic patch below with: spatch --sp-file semantic-patch.cocci --very-quiet \ --dir ./ > output.diff Note that there will be some incorrect changes in lib/dp-packet.h which you'll need to remove (e.g., Coccinelle tries to use dp_packet_is_eth() inside dp_packet_is_eth()). It's possible to avoid that, but afaik, only at the cost of a slightly more verbose semantic patch. > > > > > // > > @ dp_packet @ > > struct dp_packet_batch *b1; > > struct dp_packet_batch b2; > > struct dp_packet *p; > > expression e; > > @@ > > > > ( > > - b1->packets[b1->count++] = p; > > + dp_packet_batch_add(b1, p); > > | > > - b2.packets[b2.count++] = p; > > + dp_packet_batch_add(&b2, p); > > | > > - p->packet_type == htonl(PT_ETH) > > + dp_packet_is_eth(p) > > | > > - p->packet_type != htonl(PT_ETH) > > + !dp_packet_is_eth(p) > > | > > - b1->count == 0 > > + dp_packet_batch_is_empty(b1) > > | > > - !b1->count > > + dp_packet_batch_is_empty(b1) > > I understand the above using + and - > But can you explain the lines below? If I were to remove the four following matches on count and keep only the last two, Coccinelle would attempt to replace count assignments and increments with dp_packet_batch_size(). I might end up with e.g.: - batch->count = 0; + dp_packet_batch_size(batch) = 0; In short, I'm using these four matches to ensure the last two matches replace only rvalues. > > | > > b1->count = e; > > | > > b1->count++ > > | > > b2.count = e; > > | > > b2.count++ > > | > Why do we need "expression e" and are they match and replace something? The four preceding matches don't replace code as there's no -/+ in the leftmost column. Expression e represents anything that might be on the right-hand side. The semantic patch doesn't run without it, I'm guessing because the statement is incomplete. > > > - b1->count > > + dp_packet_batch_size(b1) > > | > > - b2.count > > + dp_packet_batch_size(&b2) > > ) > > // > > > > Signed-off-by: Paul Chaignon > > Looks good to me, thanks for the patch. > Acked-by: William Tu > > > > --- > > Changelogs: > > Changes in v2: > > - Rebased on master branch. > > - Re-applied the semantic patch (new changes in lib/netdev-afxdp.c). > > > > lib/dpif-netdev.c | 7 --- > > lib/flow.c | 2 +- > > lib/netdev-afxdp.c | 20 > > lib/netdev-dpdk.c | 3 ++- > > lib/netdev-dummy.c | 2 +- > > lib/packets.c | 4 ++-- > > lib/pcap-file.c| 2 +- > > 7 files changed, 23 insertions(+), 17 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index 75d85b2fd..d24f9502f 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -4261,7 +4261,7 @@ dp_netdev_process_rxq_port(struct > > dp_netdev_pmd_thread *pmd, > > /* At least one packet received. */ > > *recirc_depth_get() = 0; > > pmd_thread_ctx_time_update(pmd); > > -batch_cnt = batch.count; > > +batch_cnt = dp_packet_batch_size(&batch); > > if (pmd_perf_metrics_enabled(pmd)) { > > /* Update batch histogram. */ > > s->current.batches++; > > @@ -6292,7 +6292,7 @@ packet_batch_per_flow_update(struct > > packet_batch_per_flow *batch, > > { > > batch->byte_count += dp_packet_size(packet); > > batch->tcp_flags |= tcp_flags; > > -batch->array.packets[batch->array.count++]
[ovs-dev] [PATCH v2] treewide: Use packet batch APIs
This patch replaces direct accesses to dp_packet_batch and dp_packet internal components by the appropriate API calls. It extends commit 1270b6e52 (treewide: Wider use of packet batch APIs). This patch was generated using the following semantic patch (cf. http://coccinelle.lip6.fr). // @ dp_packet @ struct dp_packet_batch *b1; struct dp_packet_batch b2; struct dp_packet *p; expression e; @@ ( - b1->packets[b1->count++] = p; + dp_packet_batch_add(b1, p); | - b2.packets[b2.count++] = p; + dp_packet_batch_add(&b2, p); | - p->packet_type == htonl(PT_ETH) + dp_packet_is_eth(p) | - p->packet_type != htonl(PT_ETH) + !dp_packet_is_eth(p) | - b1->count == 0 + dp_packet_batch_is_empty(b1) | - !b1->count + dp_packet_batch_is_empty(b1) | b1->count = e; | b1->count++ | b2.count = e; | b2.count++ | - b1->count + dp_packet_batch_size(b1) | - b2.count + dp_packet_batch_size(&b2) ) // Signed-off-by: Paul Chaignon --- Changelogs: Changes in v2: - Rebased on master branch. - Re-applied the semantic patch (new changes in lib/netdev-afxdp.c). lib/dpif-netdev.c | 7 --- lib/flow.c | 2 +- lib/netdev-afxdp.c | 20 lib/netdev-dpdk.c | 3 ++- lib/netdev-dummy.c | 2 +- lib/packets.c | 4 ++-- lib/pcap-file.c| 2 +- 7 files changed, 23 insertions(+), 17 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 75d85b2fd..d24f9502f 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4261,7 +4261,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, /* At least one packet received. */ *recirc_depth_get() = 0; pmd_thread_ctx_time_update(pmd); -batch_cnt = batch.count; +batch_cnt = dp_packet_batch_size(&batch); if (pmd_perf_metrics_enabled(pmd)) { /* Update batch histogram. */ s->current.batches++; @@ -6292,7 +6292,7 @@ packet_batch_per_flow_update(struct packet_batch_per_flow *batch, { batch->byte_count += dp_packet_size(packet); batch->tcp_flags |= tcp_flags; -batch->array.packets[batch->array.count++] = packet; +dp_packet_batch_add(&batch->array, packet); } static inline void @@ -6314,7 +6314,8 @@ packet_batch_per_flow_execute(struct packet_batch_per_flow *batch, struct dp_netdev_actions *actions; struct dp_netdev_flow *flow = batch->flow; -dp_netdev_flow_used(flow, batch->array.count, batch->byte_count, +dp_netdev_flow_used(flow, dp_packet_batch_size(&batch->array), +batch->byte_count, batch->tcp_flags, pmd->ctx.now / 1000); actions = dp_netdev_flow_get_actions(flow); diff --git a/lib/flow.c b/lib/flow.c index ac6a4e121..393243309 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -1098,7 +1098,7 @@ parse_tcp_flags(struct dp_packet *packet) ovs_be16 dl_type; uint8_t nw_frag = 0, nw_proto = 0; -if (packet->packet_type != htonl(PT_ETH)) { +if (!dp_packet_is_eth(packet)) { return 0; } diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index e5b058d08..6e0180327 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -765,7 +765,7 @@ free_afxdp_buf_batch(struct dp_packet_batch *batch) addr = (uintptr_t)base & (~FRAME_SHIFT_MASK); elems[i] = (void *)addr; } -umem_elem_push_n(xpacket->mpool, batch->count, elems); +umem_elem_push_n(xpacket->mpool, dp_packet_batch_size(batch), elems); dp_packet_batch_init(batch); } @@ -860,9 +860,11 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid, free_batch = check_free_batch(batch); umem = xsk_info->umem; -ret = umem_elem_pop_n(&umem->mpool, batch->count, elems_pop); +ret = umem_elem_pop_n(&umem->mpool, dp_packet_batch_size(batch), + elems_pop); if (OVS_UNLIKELY(ret)) { -atomic_add_relaxed(&xsk_info->tx_dropped, batch->count, &orig); +atomic_add_relaxed(&xsk_info->tx_dropped, dp_packet_batch_size(batch), + &orig); VLOG_WARN_RL(&rl, "%s: send failed due to exhausted memory pool.", netdev_get_name(netdev)); error = ENOMEM; @@ -870,10 +872,12 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid, } /* Make sure we have enough TX descs. */ -ret = xsk_ring_prod__reserve(&xsk_info->tx, batch->count, &idx); +ret = xsk_ring_prod__reserve(&xsk_info->tx, dp_packet_batch_size(batch), + &idx); if (OVS_UNLIKELY(ret == 0)) { -umem_elem_push_n(&umem->mpool, batch->count, elems_pop); -atomic_add_relaxed(&xsk_info->tx_dropped, batch->count, &orig); +umem_elem_push_n(&umem->mpool, dp_packet_batch_size(batch), elems_pop);
[ovs-dev] [PATCH] treewide: Use packet batch APIs
This patch replaces direct accesses to dp_packet_batch and dp_packet internal components by the appropriate API calls. It extends commit 1270b6e52 (treewide: Wider use of packet batch APIs). This patch was generated using the following semantic patch (cf. http://coccinelle.lip6.fr). // @ dp_packet @ struct dp_packet_batch *b1; struct dp_packet_batch b2; struct dp_packet *p; expression e; @@ ( - b1->packets[b1->count++] = p; + dp_packet_batch_add(b1, p); | - b2.packets[b2.count++] = p; + dp_packet_batch_add(&b2, p); | - p->packet_type == htonl(PT_ETH) + dp_packet_is_eth(p) | - p->packet_type != htonl(PT_ETH) + !dp_packet_is_eth(p) | - b1->count == 0 + dp_packet_batch_is_empty(b1) | - !b1->count + dp_packet_batch_is_empty(b1) | b1->count = e; | b1->count++ | b2.count = e; | b2.count++ | - b1->count + dp_packet_batch_size(b1) | - b2.count + dp_packet_batch_size(&b2) ) // Signed-off-by: Paul Chaignon --- lib/dpif-netdev.c | 7 --- lib/flow.c | 2 +- lib/netdev-dpdk.c | 3 ++- lib/netdev-dummy.c | 2 +- lib/packets.c | 4 ++-- lib/pcap-file.c| 2 +- 6 files changed, 11 insertions(+), 9 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f4b59e41b..1b19b8870 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4264,7 +4264,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, /* At least one packet received. */ *recirc_depth_get() = 0; pmd_thread_ctx_time_update(pmd); -batch_cnt = batch.count; +batch_cnt = dp_packet_batch_size(&batch); if (pmd_perf_metrics_enabled(pmd)) { /* Update batch histogram. */ s->current.batches++; @@ -6249,7 +6249,7 @@ packet_batch_per_flow_update(struct packet_batch_per_flow *batch, { batch->byte_count += dp_packet_size(packet); batch->tcp_flags |= tcp_flags; -batch->array.packets[batch->array.count++] = packet; +dp_packet_batch_add(&batch->array, packet); } static inline void @@ -6271,7 +6271,8 @@ packet_batch_per_flow_execute(struct packet_batch_per_flow *batch, struct dp_netdev_actions *actions; struct dp_netdev_flow *flow = batch->flow; -dp_netdev_flow_used(flow, batch->array.count, batch->byte_count, +dp_netdev_flow_used(flow, dp_packet_batch_size(&batch->array), +batch->byte_count, batch->tcp_flags, pmd->ctx.now / 1000); actions = dp_netdev_flow_get_actions(flow); diff --git a/lib/flow.c b/lib/flow.c index de9370449..31d1d191e 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -1101,7 +1101,7 @@ parse_tcp_flags(struct dp_packet *packet) ovs_be16 dl_type; uint8_t nw_frag = 0, nw_proto = 0; -if (packet->packet_type != htonl(PT_ETH)) { +if (!dp_packet_is_eth(packet)) { return 0; } diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index a380b6ffe..b5168b939 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2466,7 +2466,8 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid, dpdk_do_tx_copy(netdev, qid, batch); dp_packet_delete_batch(batch, true); } else { -__netdev_dpdk_vhost_send(netdev, qid, batch->packets, batch->count); +__netdev_dpdk_vhost_send(netdev, qid, batch->packets, + dp_packet_batch_size(batch)); } return 0; } diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index f0c0fbae8..95e1a329a 100644 --- a/lib/netdev-dummy.c +++ b/lib/netdev-dummy.c @@ -1108,7 +1108,7 @@ netdev_dummy_send(struct netdev *netdev, int qid OVS_UNUSED, const void *buffer = dp_packet_data(packet); size_t size = dp_packet_size(packet); -if (packet->packet_type != htonl(PT_ETH)) { +if (!dp_packet_is_eth(packet)) { error = EPFNOSUPPORT; break; } diff --git a/lib/packets.c b/lib/packets.c index a8fd61fc0..e147e6874 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -246,7 +246,7 @@ push_eth(struct dp_packet *packet, const struct eth_addr *dst, { struct eth_header *eh; -ovs_assert(packet->packet_type != htonl(PT_ETH)); +ovs_assert(!dp_packet_is_eth(packet)); eh = dp_packet_resize_l2(packet, ETH_HEADER_LEN); eh->eth_dst = *dst; eh->eth_src = *src; @@ -265,7 +265,7 @@ pop_eth(struct dp_packet *packet) ovs_be16 ethertype; int increment; -ovs_assert(packet->packet_type == htonl(PT_ETH)); +ovs_assert(dp_packet_is_eth(packet)); ovs_assert(l3 != NULL); if (l2_5) { diff --git a/lib/pcap-file.c b/lib/pcap-file.c index e4e92b8c2..f0cac8e0f 100644 --- a/lib/pcap-file.c +++ b/lib/pcap-file.c @@ -282,7 +282,7 @@ ovs_pcap_write(struct pcap_file *p_file, struct dp_packet *buf) struct pcaprec_hdr prh; struct timeval tv; -ovs_assert(buf->packet_type == htonl(
Re: [ovs-dev] [iovisor-dev] [RFC PATCH 00/11] OVS eBPF datapath.
On Wed, Jul 04, 2018 at 07:25:50PM -0700, William Tu wrote: > On Tue, Jul 3, 2018 at 10:56 AM, Alexei Starovoitov > wrote: > > On Thu, Jun 28, 2018 at 07:19:35AM -0700, William Tu wrote: > >> Hi Alexei, > >> > >> Thanks a lot for the feedback! > >> > >> On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov > >> wrote: > >> > On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote: > >> >> > >> >> Discussion > >> >> == > >> >> We are still actively working on finishing the feature, currently > >> >> the basic forwarding and tunnel feature work, but still under > >> >> heavy debugging and development. The purpose of this RFC is to > >> >> get some early feedbacks and direction for finishing the complete > >> >> features in existing kernel's OVS datapath (the net/openvswitch/*). > >> > > >> > Thank you for sharing the patches. > >> > > >> >> Three major issues we are worried: > >> >> a. Megaflow support in BPF. > >> >> b. Connection Tracking support in BPF. > >> > > >> > my opinion on the above two didn't change. > >> > To recap: > >> > A. Non scalable megaflow map is no go. I'd like to see packet > >> > classification > >> > algorithm like hicuts or efficuts to be implemented instead, since it > >> > can be > >> > shared by generic bpf, bpftiler, ovs and likely others. > >> > >> We did try the decision tree approach using dpdk's acl lib. The lookup > >> speed is 6 times faster than the magaflow using tuple space. > >> However, the update/insertion requires rebuilding/re-balancing the decision > >> tree so it's way too slow. I think hicuts or efficuts suffers the same > >> issue. > >> So decision tree algos are scalable only for lookup operation due to its > >> optimization over tree depth, but not scalable under > >> update/insert/delete operations. > >> > >> On customer's system we see megaflow update/insert rate around 10 > >> rules/sec, > >> this makes decision tree unusable, unless we invent something to optimize > >> the > >> update/insert time or incremental update of these decision tree algo. > > > > is this a typo? you probably meant 10K rule updates a second ? > I mean "new" rules being added at 10 rules/sec. > Update rate might be much higher. > > > Last time I've dealt with these algorithms we had 100K acl updates a second. > > It was an important metric that we were optimizing for. > > I'm pretty sure '*cuts' algos do many thousands per second non optimized. > > When adding a new rule, do these algorithms require rebuilding the > entire tree? > > In our evaluation, updating an existing entry in the decision tree > performs OK, because it is equal to lookup and replace, and lookup > is fast, update is just atomic swap. But inserting a new rule is slow, > because it requires re-building the tree using all existing rules. > And we see new rule being added at rate 10 rules per second. > So we are constantly rebuilding the entire tree. > > If the entire tree has 100k of rules, it takes around 2 seconds to rebuild, > based on the dpdk acl library. And without an incremental algorithm, > adding 1 new rule will trigger rebuilding the 100k of rules, and it is too > slow. > > Reading through HyperCuts and EffiCuts, I'm not sure how it supports > incrementally adding a new rule, without rebuilding the entire tree. > http://ccr.sigcomm.org/online/files/p207.pdf > http://cseweb.ucsd.edu/~susingh/papers/hyp-sigcomm03.pdf > > The HyperCuts papers says > "A fast update algorithm can also be implemented; however we do not > go into the details of incremental update in this paper" > > > > >> >> c. Verifier limitation. > >> > > >> > Not sure what limitations you're concerned about. > >> > > >> > >> Mostly related to stack. The flow key OVS uses (struct sw_flow_key) > >> is 464 byte. We trim a lot, now around 300 byte, but still huge, > >> considering > >> the BPF's stack limit is 512 byte. > > > > have you tried using per-cpu array of one element with large value > > instead of stack? > > yes, now we store the flow key in percpu array with 1 element. > > > In the latest verifier most of the operations that can be done with the > > stack > > pointer can be done with pointer to map value too. > > > Once the flow key is stored in map, another eBPF program > needs to use that key to lookup flow table (another map). > So we have to store the flow key on stack first, in order to > use it as key to lookup the flow table map. > > Is there a way to work around it? d71962f ("bpf: allow map helpers access to map values directly") removes that limitation from the verifier and should allow you to use map values as map keys directly. 4.18-rc1 has it. > Thanks > William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev