Re: [ovs-dev] [PATCH v2] treewide: Use packet batch APIs

2019-09-03 Thread Paul Chaignon
On Tue, Sep 03, 2019 at 06:37PM +, William Tu  wrote:
> On Sun, Sep 01, 2019 at 03:10:05PM +0200, Paul Chaignon wrote:
> > This patch replaces direct accesses to dp_packet_batch and dp_packet
> > internal components by the appropriate API calls.  It extends commit
> > 1270b6e52 (treewide: Wider use of packet batch APIs).
> >
> > This patch was generated using the following semantic patch (cf.
> > http://coccinelle.lip6.fr).
>
> Looks very interesting, I spent some time learning it.

Glad you find it interesting!  I'm actually trying to practice using it
myself.  (If you have any usage ideas for Open vSwitch, I'm interested; I
already went through all the treewide patches.)

> If you have time, can you show us how to run it?
> I installed coccinelle on ubuntu and I can run on linux kernel
>  make coccicheck MODE=report
> but for OVS, do we provide the semantic patch below and run
> it manually?

I executed the semantic patch below with:
  spatch --sp-file semantic-patch.cocci --very-quiet \
 --dir ./ > output.diff

Note that there will be some incorrect changes in lib/dp-packet.h which
you'll need to remove (e.g., Coccinelle tries to use dp_packet_is_eth()
inside dp_packet_is_eth()).  It's possible to avoid that, but afaik, only
at the cost of a slightly more verbose semantic patch.

>
> >
> > // 
> > @ dp_packet @
> > struct dp_packet_batch *b1;
> > struct dp_packet_batch b2;
> > struct dp_packet *p;
> > expression e;
> > @@
> >
> > (
> > - b1->packets[b1->count++] = p;
> > + dp_packet_batch_add(b1, p);
> > |
> > - b2.packets[b2.count++] = p;
> > + dp_packet_batch_add(&b2, p);
> > |
> > - p->packet_type == htonl(PT_ETH)
> > + dp_packet_is_eth(p)
> > |
> > - p->packet_type != htonl(PT_ETH)
> > + !dp_packet_is_eth(p)
> > |
> > - b1->count == 0
> > + dp_packet_batch_is_empty(b1)
> > |
> > - !b1->count
> > + dp_packet_batch_is_empty(b1)
>
> I understand the above using + and -
> But can you explain the lines below?

If I were to remove the four following matches on count and keep only the
last two, Coccinelle would attempt to replace count assignments and
increments with dp_packet_batch_size().  I might end up with e.g.:

  - batch->count = 0;
  + dp_packet_batch_size(batch) = 0;

In short, I'm using these four matches to ensure the last two matches
replace only rvalues.

> > |
> >   b1->count = e;
> > |
> >   b1->count++
> > |
> >   b2.count = e;
> > |
> >   b2.count++
> > |
> Why do we need "expression e" and are they match and replace something?

The four preceding matches don't replace code as there's no -/+ in the
leftmost column.
Expression e represents anything that might be on the right-hand side.
The semantic patch doesn't run without it, I'm guessing because the
statement is incomplete.

>
> > - b1->count
> > + dp_packet_batch_size(b1)
> > |
> > - b2.count
> > + dp_packet_batch_size(&b2)
> > )
> > // 
> >
> > Signed-off-by: Paul Chaignon 
>
> Looks good to me, thanks for the patch.
> Acked-by: William Tu 
>
>
> > ---
> > Changelogs:
> >   Changes in v2:
> > - Rebased on master branch.
> > - Re-applied the semantic patch (new changes in lib/netdev-afxdp.c).
> >
> >  lib/dpif-netdev.c  |  7 ---
> >  lib/flow.c |  2 +-
> >  lib/netdev-afxdp.c | 20 
> >  lib/netdev-dpdk.c  |  3 ++-
> >  lib/netdev-dummy.c |  2 +-
> >  lib/packets.c  |  4 ++--
> >  lib/pcap-file.c|  2 +-
> >  7 files changed, 23 insertions(+), 17 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 75d85b2fd..d24f9502f 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4261,7 +4261,7 @@ dp_netdev_process_rxq_port(struct 
> > dp_netdev_pmd_thread *pmd,
> >  /* At least one packet received. */
> >  *recirc_depth_get() = 0;
> >  pmd_thread_ctx_time_update(pmd);
> > -batch_cnt = batch.count;
> > +batch_cnt = dp_packet_batch_size(&batch);
> >  if (pmd_perf_metrics_enabled(pmd)) {
> >  /* Update batch histogram. */
> >  s->current.batches++;
> > @@ -6292,7 +6292,7 @@ packet_batch_per_flow_update(struct 
> > packet_batch_per_flow *batch,
> >  {
> >  batch->byte_count += dp_packet_size(packet);
> >  batch->tcp_flags |= tcp_flags;
> > -batch->array.packets[batch->array.count++]

[ovs-dev] [PATCH v2] treewide: Use packet batch APIs

2019-09-01 Thread Paul Chaignon
This patch replaces direct accesses to dp_packet_batch and dp_packet
internal components by the appropriate API calls.  It extends commit
1270b6e52 (treewide: Wider use of packet batch APIs).

This patch was generated using the following semantic patch (cf.
http://coccinelle.lip6.fr).

// 
@ dp_packet @
struct dp_packet_batch *b1;
struct dp_packet_batch b2;
struct dp_packet *p;
expression e;
@@

(
- b1->packets[b1->count++] = p;
+ dp_packet_batch_add(b1, p);
|
- b2.packets[b2.count++] = p;
+ dp_packet_batch_add(&b2, p);
|
- p->packet_type == htonl(PT_ETH)
+ dp_packet_is_eth(p)
|
- p->packet_type != htonl(PT_ETH)
+ !dp_packet_is_eth(p)
|
- b1->count == 0
+ dp_packet_batch_is_empty(b1)
|
- !b1->count
+ dp_packet_batch_is_empty(b1)
|
  b1->count = e;
|
  b1->count++
|
  b2.count = e;
|
  b2.count++
|
- b1->count
+ dp_packet_batch_size(b1)
|
- b2.count
+ dp_packet_batch_size(&b2)
)
// 

Signed-off-by: Paul Chaignon 
---
Changelogs:
  Changes in v2:
- Rebased on master branch.
- Re-applied the semantic patch (new changes in lib/netdev-afxdp.c).

 lib/dpif-netdev.c  |  7 ---
 lib/flow.c |  2 +-
 lib/netdev-afxdp.c | 20 
 lib/netdev-dpdk.c  |  3 ++-
 lib/netdev-dummy.c |  2 +-
 lib/packets.c  |  4 ++--
 lib/pcap-file.c|  2 +-
 7 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 75d85b2fd..d24f9502f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4261,7 +4261,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 /* At least one packet received. */
 *recirc_depth_get() = 0;
 pmd_thread_ctx_time_update(pmd);
-batch_cnt = batch.count;
+batch_cnt = dp_packet_batch_size(&batch);
 if (pmd_perf_metrics_enabled(pmd)) {
 /* Update batch histogram. */
 s->current.batches++;
@@ -6292,7 +6292,7 @@ packet_batch_per_flow_update(struct packet_batch_per_flow 
*batch,
 {
 batch->byte_count += dp_packet_size(packet);
 batch->tcp_flags |= tcp_flags;
-batch->array.packets[batch->array.count++] = packet;
+dp_packet_batch_add(&batch->array, packet);
 }
 
 static inline void
@@ -6314,7 +6314,8 @@ packet_batch_per_flow_execute(struct 
packet_batch_per_flow *batch,
 struct dp_netdev_actions *actions;
 struct dp_netdev_flow *flow = batch->flow;
 
-dp_netdev_flow_used(flow, batch->array.count, batch->byte_count,
+dp_netdev_flow_used(flow, dp_packet_batch_size(&batch->array),
+batch->byte_count,
 batch->tcp_flags, pmd->ctx.now / 1000);
 
 actions = dp_netdev_flow_get_actions(flow);
diff --git a/lib/flow.c b/lib/flow.c
index ac6a4e121..393243309 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1098,7 +1098,7 @@ parse_tcp_flags(struct dp_packet *packet)
 ovs_be16 dl_type;
 uint8_t nw_frag = 0, nw_proto = 0;
 
-if (packet->packet_type != htonl(PT_ETH)) {
+if (!dp_packet_is_eth(packet)) {
 return 0;
 }
 
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index e5b058d08..6e0180327 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -765,7 +765,7 @@ free_afxdp_buf_batch(struct dp_packet_batch *batch)
 addr = (uintptr_t)base & (~FRAME_SHIFT_MASK);
 elems[i] = (void *)addr;
 }
-umem_elem_push_n(xpacket->mpool, batch->count, elems);
+umem_elem_push_n(xpacket->mpool, dp_packet_batch_size(batch), elems);
 dp_packet_batch_init(batch);
 }
 
@@ -860,9 +860,11 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
 free_batch = check_free_batch(batch);
 
 umem = xsk_info->umem;
-ret = umem_elem_pop_n(&umem->mpool, batch->count, elems_pop);
+ret = umem_elem_pop_n(&umem->mpool, dp_packet_batch_size(batch),
+  elems_pop);
 if (OVS_UNLIKELY(ret)) {
-atomic_add_relaxed(&xsk_info->tx_dropped, batch->count, &orig);
+atomic_add_relaxed(&xsk_info->tx_dropped, dp_packet_batch_size(batch),
+   &orig);
 VLOG_WARN_RL(&rl, "%s: send failed due to exhausted memory pool.",
  netdev_get_name(netdev));
 error = ENOMEM;
@@ -870,10 +872,12 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
 }
 
 /* Make sure we have enough TX descs. */
-ret = xsk_ring_prod__reserve(&xsk_info->tx, batch->count, &idx);
+ret = xsk_ring_prod__reserve(&xsk_info->tx, dp_packet_batch_size(batch),
+ &idx);
 if (OVS_UNLIKELY(ret == 0)) {
-umem_elem_push_n(&umem->mpool, batch->count, elems_pop);
-atomic_add_relaxed(&xsk_info->tx_dropped, batch->count, &orig);
+umem_elem_push_n(&umem->mpool, dp_packet_batch_size(batch), elems_pop);

[ovs-dev] [PATCH] treewide: Use packet batch APIs

2019-07-07 Thread Paul Chaignon
This patch replaces direct accesses to dp_packet_batch and dp_packet
internal components by the appropriate API calls.  It extends commit
1270b6e52 (treewide: Wider use of packet batch APIs).

This patch was generated using the following semantic patch (cf.
http://coccinelle.lip6.fr).

// 
@ dp_packet @
struct dp_packet_batch *b1;
struct dp_packet_batch b2;
struct dp_packet *p;
expression e;
@@

(
- b1->packets[b1->count++] = p;
+ dp_packet_batch_add(b1, p);
|
- b2.packets[b2.count++] = p;
+ dp_packet_batch_add(&b2, p);
|
- p->packet_type == htonl(PT_ETH)
+ dp_packet_is_eth(p)
|
- p->packet_type != htonl(PT_ETH)
+ !dp_packet_is_eth(p)
|
- b1->count == 0
+ dp_packet_batch_is_empty(b1)
|
- !b1->count
+ dp_packet_batch_is_empty(b1)
|
  b1->count = e;
|
  b1->count++
|
  b2.count = e;
|
  b2.count++
|
- b1->count
+ dp_packet_batch_size(b1)
|
- b2.count
+ dp_packet_batch_size(&b2)
)
// 

Signed-off-by: Paul Chaignon 
---
 lib/dpif-netdev.c  | 7 ---
 lib/flow.c | 2 +-
 lib/netdev-dpdk.c  | 3 ++-
 lib/netdev-dummy.c | 2 +-
 lib/packets.c  | 4 ++--
 lib/pcap-file.c| 2 +-
 6 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f4b59e41b..1b19b8870 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4264,7 +4264,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 /* At least one packet received. */
 *recirc_depth_get() = 0;
 pmd_thread_ctx_time_update(pmd);
-batch_cnt = batch.count;
+batch_cnt = dp_packet_batch_size(&batch);
 if (pmd_perf_metrics_enabled(pmd)) {
 /* Update batch histogram. */
 s->current.batches++;
@@ -6249,7 +6249,7 @@ packet_batch_per_flow_update(struct packet_batch_per_flow 
*batch,
 {
 batch->byte_count += dp_packet_size(packet);
 batch->tcp_flags |= tcp_flags;
-batch->array.packets[batch->array.count++] = packet;
+dp_packet_batch_add(&batch->array, packet);
 }
 
 static inline void
@@ -6271,7 +6271,8 @@ packet_batch_per_flow_execute(struct 
packet_batch_per_flow *batch,
 struct dp_netdev_actions *actions;
 struct dp_netdev_flow *flow = batch->flow;
 
-dp_netdev_flow_used(flow, batch->array.count, batch->byte_count,
+dp_netdev_flow_used(flow, dp_packet_batch_size(&batch->array),
+batch->byte_count,
 batch->tcp_flags, pmd->ctx.now / 1000);
 
 actions = dp_netdev_flow_get_actions(flow);
diff --git a/lib/flow.c b/lib/flow.c
index de9370449..31d1d191e 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1101,7 +1101,7 @@ parse_tcp_flags(struct dp_packet *packet)
 ovs_be16 dl_type;
 uint8_t nw_frag = 0, nw_proto = 0;
 
-if (packet->packet_type != htonl(PT_ETH)) {
+if (!dp_packet_is_eth(packet)) {
 return 0;
 }
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index a380b6ffe..b5168b939 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2466,7 +2466,8 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
 dpdk_do_tx_copy(netdev, qid, batch);
 dp_packet_delete_batch(batch, true);
 } else {
-__netdev_dpdk_vhost_send(netdev, qid, batch->packets, batch->count);
+__netdev_dpdk_vhost_send(netdev, qid, batch->packets,
+ dp_packet_batch_size(batch));
 }
 return 0;
 }
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index f0c0fbae8..95e1a329a 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1108,7 +1108,7 @@ netdev_dummy_send(struct netdev *netdev, int qid 
OVS_UNUSED,
 const void *buffer = dp_packet_data(packet);
 size_t size = dp_packet_size(packet);
 
-if (packet->packet_type != htonl(PT_ETH)) {
+if (!dp_packet_is_eth(packet)) {
 error = EPFNOSUPPORT;
 break;
 }
diff --git a/lib/packets.c b/lib/packets.c
index a8fd61fc0..e147e6874 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -246,7 +246,7 @@ push_eth(struct dp_packet *packet, const struct eth_addr 
*dst,
 {
 struct eth_header *eh;
 
-ovs_assert(packet->packet_type != htonl(PT_ETH));
+ovs_assert(!dp_packet_is_eth(packet));
 eh = dp_packet_resize_l2(packet, ETH_HEADER_LEN);
 eh->eth_dst = *dst;
 eh->eth_src = *src;
@@ -265,7 +265,7 @@ pop_eth(struct dp_packet *packet)
 ovs_be16 ethertype;
 int increment;
 
-ovs_assert(packet->packet_type == htonl(PT_ETH));
+ovs_assert(dp_packet_is_eth(packet));
 ovs_assert(l3 != NULL);
 
 if (l2_5) {
diff --git a/lib/pcap-file.c b/lib/pcap-file.c
index e4e92b8c2..f0cac8e0f 100644
--- a/lib/pcap-file.c
+++ b/lib/pcap-file.c
@@ -282,7 +282,7 @@ ovs_pcap_write(struct pcap_file *p_file, struct dp_packet 
*buf)
 struct pcaprec_hdr prh;
 struct timeval tv;
 
-ovs_assert(buf->packet_type == htonl(

Re: [ovs-dev] [iovisor-dev] [RFC PATCH 00/11] OVS eBPF datapath.

2018-07-05 Thread Paul Chaignon
On Wed, Jul 04, 2018 at 07:25:50PM -0700, William Tu wrote:
> On Tue, Jul 3, 2018 at 10:56 AM, Alexei Starovoitov
>  wrote:
> > On Thu, Jun 28, 2018 at 07:19:35AM -0700, William Tu wrote:
> >> Hi Alexei,
> >>
> >> Thanks a lot for the feedback!
> >>
> >> On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov
> >>  wrote:
> >> > On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote:
> >> >>
> >> >> Discussion
> >> >> ==
> >> >> We are still actively working on finishing the feature, currently
> >> >> the basic forwarding and tunnel feature work, but still under
> >> >> heavy debugging and development.  The purpose of this RFC is to
> >> >> get some early feedbacks and direction for finishing the complete
> >> >> features in existing kernel's OVS datapath (the net/openvswitch/*).
> >> >
> >> > Thank you for sharing the patches.
> >> >
> >> >> Three major issues we are worried:
> >> >>   a. Megaflow support in BPF.
> >> >>   b. Connection Tracking support in BPF.
> >> >
> >> > my opinion on the above two didn't change.
> >> > To recap:
> >> > A. Non scalable megaflow map is no go. I'd like to see packet 
> >> > classification
> >> > algorithm like hicuts or efficuts to be implemented instead, since it 
> >> > can be
> >> > shared by generic bpf, bpftiler, ovs and likely others.
> >>
> >> We did try the decision tree approach using dpdk's acl lib. The lookup
> >> speed is 6 times faster than the magaflow using tuple space.
> >> However, the update/insertion requires rebuilding/re-balancing the decision
> >> tree so it's way too slow. I think hicuts or efficuts suffers the same 
> >> issue.
> >> So decision tree algos are scalable only for lookup operation due to its
> >> optimization over tree depth, but not scalable under
> >> update/insert/delete operations.
> >>
> >> On customer's system we see megaflow update/insert rate around 10 
> >> rules/sec,
> >> this makes decision tree unusable, unless we invent something to optimize 
> >> the
> >> update/insert time or incremental update of these decision tree algo.
> >
> > is this a typo? you probably meant 10K rule updates a second ?
> I mean "new" rules being added at 10 rules/sec.
> Update rate might be much higher.
> 
> > Last time I've dealt with these algorithms we had 100K acl updates a second.
> > It was an important metric that we were optimizing for.
> > I'm pretty sure '*cuts' algos do many thousands per second non optimized.
> 
> When adding a new rule, do these algorithms require rebuilding the
> entire tree?
> 
> In our evaluation, updating an existing entry in the decision tree
> performs OK, because it is equal to lookup and replace, and lookup
> is fast, update is just atomic swap. But inserting a new rule is slow,
> because it requires re-building the tree using all existing rules.
> And we see new rule being added at rate 10 rules per second.
> So we are constantly rebuilding the entire tree.
> 
> If the entire tree has 100k of rules, it takes around 2 seconds to rebuild,
> based on the dpdk acl library.  And without an incremental algorithm,
> adding 1 new rule will trigger rebuilding the 100k of rules, and it is too 
> slow.
> 
> Reading through HyperCuts and EffiCuts, I'm not sure how it supports
> incrementally adding a new rule, without rebuilding the entire tree.
> http://ccr.sigcomm.org/online/files/p207.pdf
> http://cseweb.ucsd.edu/~susingh/papers/hyp-sigcomm03.pdf
> 
> The HyperCuts papers says
> "A fast update algorithm can also be implemented; however we do not
> go into the details of incremental update in this paper"
> 
> >
> >> >>   c. Verifier limitation.
> >> >
> >> > Not sure what limitations you're concerned about.
> >> >
> >>
> >> Mostly related to stack.  The flow key OVS uses (struct sw_flow_key)
> >> is 464 byte. We trim a lot, now around 300 byte, but still huge, 
> >> considering
> >> the BPF's stack limit is 512 byte.
> >
> > have you tried using per-cpu array of one element with large value
> > instead of stack?
> 
> yes, now we store the flow key in percpu array with 1 element.
> 
> > In the latest verifier most of the operations that can be done with the 
> > stack
> > pointer can be done with pointer to map value too.
> >
> Once the flow key is stored in map, another eBPF program
> needs to use that key to lookup flow table (another map).
> So we have to store the flow key on stack first, in order to
> use it as key to lookup the flow table map.
> 
> Is there a way to work around it?

d71962f ("bpf: allow map helpers access to map values directly") removes
that limitation from the verifier and should allow you to use map values
as map keys directly.  4.18-rc1 has it.

> Thanks
> William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev