Re: [ovs-dev] dpif-netdev bug regarding packet matches or ufids?

2018-09-28 Thread 0-day Robot
Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Ben Pfaff  needs to sign off.
Lines checked: 113, Warnings: 0, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] dpif-netdev bug regarding packet matches or ufids?

2018-09-28 Thread Ben Pfaff
I have been playing with the OFTest framework off and on over the last
few months and I believe it's caught an actual bug with its
DirectVlanPackets test.  I'm appending a script that demonstrates it.
To see it, run "make sandbox" then execute the script inside it.

At some level at least, the problem is in dpif-netdev, which during the
test logs messages like this indicating that a datapath flow can't be
found and therefore can't be modified:

WARN|dummy@ovs-dummy: failed to put[modify] (No such file or
directory) ufid:b2fe59af-26bd-4aa0-960e-fd559dc10c54

skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(1),packet_type(ns=0,id=0),eth(src=00:06:07:08:09:0a,dst=00:01:02:03:04:05),eth_type(0x8100),vlan(vid=1000/0x0,pcp=5/0x0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0,dst=127.0.0.1/0.0.0.0,proto=0/0,tos=0/0,ttl=64/0,frag=no)),

actions:userspace(pid=0,controller(reason=7,dont_send=0,continuation=0,recirc_id=1,rule_cookie=0,controller_id=0,max_len=65535))

It's possible that the caller is screwing up by providing a flow match
different from one in the datapath flow table.  I have not looked at
that question.  But the really strange thing here is that the caller is
providing a ufid and dpif-netdev is ignoring it.  When I make it honor
the ufid, like the following, then the test suddenly starts working.
This seems like a bug to me.

(The following patch is not actually the real fix because the code
should still look at the key if no ufid was provided.)

Any thoughts?

Thanks,

Ben.

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e322f5553fa8..4855cc08f7b7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3273,7 +3273,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 
 static int
 flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
-struct netdev_flow_key *key,
+struct netdev_flow_key *key OVS_UNUSED,
 struct match *match,
 ovs_u128 *ufid,
 const struct dpif_flow_put *put,
@@ -3287,7 +3287,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
 }
 
 ovs_mutex_lock(&pmd->flow_mutex);
-netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
+netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid, NULL, 0);
 if (!netdev_flow) {
 if (put->flags & DPIF_FP_CREATE) {
 if (cmap_count(&pmd->flow_table) < MAX_FLOWS) {



--8<--cut here-->8--

#! /bin/sh -ex

# Create a bridge and add port p1 on port 1, p2 on port 2
ovs-appctl vlog/set netdev_dummy dpif
ovs-vsctl --if-exists del-br br0 \
   -- add-br br0 \
   -- add-port br0 p1 -- set interface p1 ofport_request=1 
options:tx_pcap=p1-tx.pcap options:rxq_pcap=p1-rx.pcap \
   -- add-port br0 p2 -- set interface p2 ofport_request=2 
options:tx_pcap=p2-tx.pcap options:rxq_pcap=p2-rx.pcap

# Add a flow that directs some packets received on p1 to p2 and the
# rest back out p1.
#
# Then inject a packet of the form that should go to p2.  Dump the
# datapath flow to see that it goes to p2 ("actions:2").  Trace the
# same packet for good measure, to also see that it should go to p2.
packet=00010203040500060708090a8100a3e808004514000140007ce77f017f01
ovs-ofctl add-flow br0 
priority=1,ip,in_port=1,dl_src=00:06:07:08:09:0a,dl_dst=00:01:02:03:04:05,actions=output:2
ovs-ofctl add-flow br0 priority=0,in_port=1,actions=IN_PORT
ovs-appctl netdev-dummy/receive p1 $packet
ovs-appctl ofproto/trace br0 in_port=p1 $packet
ovs-appctl dpif/dump-flows br0

# Delete the flows, then add new flows that would not match the same
# packet as before.
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 
priority=1,in_port=1,dl_src=00:06:07:08:09:0a,dl_dst=00:01:02:03:04:05,dl_type=0x0801,actions=output:2
ovs-ofctl add-flow br0 priority=0,in_port=1,actions=IN_PORT

# Give the datapath a few seconds to revalidate.
sleep 2

# Now inject the same packet again.  It should go to p1 because it
# only matches the priority-0 flow, as shown by the trace.  However, instead,
# the original datapath flow is still there and its counters get incremented.
ovs-appctl netdev-dummy/receive p1 $packet
ovs-appctl ofproto/trace br0 in_port=p1 $packet
ovs-appctl dpif/dump-flows br0

# Give the datapath some time to expire the flow.
sleep 12

# Try again.
ovs-appctl netdev-dummy/receive p1 $packet
ovs-appctl ofproto/trace br0 in_port=p1 $packet
ovs-appctl dpif/dump-flows br0
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Fix bug in oss-fuzz options file

2018-09-28 Thread Bhargava Shastry
Hi,

Thanks for your inputs. I shall leave the signed off field intact in my future 
patches.

Regards
Bhargava

On September 28, 2018 8:09:02 PM GMT+02:00, Ben Pfaff  wrote:
>Is it the list archive that you're worried about?  The archive does
>this
>kind of minor obfuscation of email addresses anyway.
>
>On Fri, Sep 28, 2018 at 07:44:36PM +0200, Bhargava Shastry wrote:
>> Hi Aaron,
>> 
>> The mistake is on my part. I am breaking up my email by hand because
>I
>> fear bots on the Internet are going to pick up well formatted emails
>and
>> send spam :-)
>> 
>> Is my concern well-founded? Do you have any suggestions for fighting
>> auto-scraping of email addresses?
>> 
>> Regards,
>> Bhargava
>> 
>> On 09/28/2018 05:20 PM, Aaron Conole wrote:
>> > bshas...@sect.tu-berlin.de writes:
>> > 
>> >> From: Bhargava Shastry 
>> >>
>> >> oss-fuzz options file must begin with a [libfuzzer] header.
>> >> This was missing in the expr_parse_target.options file which this
>patch 
>> >> fixes.
>> >>
>> >> Signed-off-by: Bhargava Shastry 
>> >> ---
>> > 
>> > Just wanted to let you know that the bot is complaining because of
>your
>> > signature line:
>> > 
>> >  'Signed-off-by: Bhargava Shastry '
>> > 
>> > I'm not sure if we should amend checkpatch to recognize this kind
>of
>> > construct.  It isn't an email address.
>> > 
>> > Is this a mistake on your end, or is some mail filter scrambling
>it?
>> > 
>> > Then again, maybe we should adopt Postel's law and allow for ' at '
>to
>> > also be an email divider for signoff lines?
>> > 
>> 
>> -- 
>> Bhargava Shastry 
>> Security in Telecommunications
>> TU Berlin / Telekom Innovation Laboratories
>> Ernst-Reuter-Platz 7, Sekr TEL 17 / D - 10587 Berlin, Germany
>> phone: +49 30 8353 58235
>> Keybase: https://keybase.io/bshastry
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Fix bug in oss-fuzz options file

2018-09-28 Thread Ben Pfaff
Is it the list archive that you're worried about?  The archive does this
kind of minor obfuscation of email addresses anyway.

On Fri, Sep 28, 2018 at 07:44:36PM +0200, Bhargava Shastry wrote:
> Hi Aaron,
> 
> The mistake is on my part. I am breaking up my email by hand because I
> fear bots on the Internet are going to pick up well formatted emails and
> send spam :-)
> 
> Is my concern well-founded? Do you have any suggestions for fighting
> auto-scraping of email addresses?
> 
> Regards,
> Bhargava
> 
> On 09/28/2018 05:20 PM, Aaron Conole wrote:
> > bshas...@sect.tu-berlin.de writes:
> > 
> >> From: Bhargava Shastry 
> >>
> >> oss-fuzz options file must begin with a [libfuzzer] header.
> >> This was missing in the expr_parse_target.options file which this patch 
> >> fixes.
> >>
> >> Signed-off-by: Bhargava Shastry 
> >> ---
> > 
> > Just wanted to let you know that the bot is complaining because of your
> > signature line:
> > 
> >  'Signed-off-by: Bhargava Shastry '
> > 
> > I'm not sure if we should amend checkpatch to recognize this kind of
> > construct.  It isn't an email address.
> > 
> > Is this a mistake on your end, or is some mail filter scrambling it?
> > 
> > Then again, maybe we should adopt Postel's law and allow for ' at ' to
> > also be an email divider for signoff lines?
> > 
> 
> -- 
> Bhargava Shastry 
> Security in Telecommunications
> TU Berlin / Telekom Innovation Laboratories
> Ernst-Reuter-Platz 7, Sekr TEL 17 / D - 10587 Berlin, Germany
> phone: +49 30 8353 58235
> Keybase: https://keybase.io/bshastry
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Fix bug in oss-fuzz options file

2018-09-28 Thread Aaron Conole
Bhargava Shastry  writes:

> Hi Aaron,
>
> The mistake is on my part. I am breaking up my email by hand because I
> fear bots on the Internet are going to pick up well formatted emails and
> send spam :-)
>
> Is my concern well-founded? Do you have any suggestions for fighting
> auto-scraping of email addresses?

Well, I don't know much about all of that.

But I'll point out that the mailing list web interface does filter your
email (so it already reads 'bshastry at sect.tu-berlin.de').  And bots
can scrape the patchwork database for your actual email anyway (which
will include the From: field).  They even just subscribe to the list
and then you offer it to them freely.

It might be best to keep with the well formatted signed-off-by lines for
now, and try to improve your spam filter technology.

> Regards,
> Bhargava
>
> On 09/28/2018 05:20 PM, Aaron Conole wrote:
>> bshas...@sect.tu-berlin.de writes:
>> 
>>> From: Bhargava Shastry 
>>>
>>> oss-fuzz options file must begin with a [libfuzzer] header.
>>> This was missing in the expr_parse_target.options file which this patch 
>>> fixes.
>>>
>>> Signed-off-by: Bhargava Shastry 
>>> ---
>> 
>> Just wanted to let you know that the bot is complaining because of your
>> signature line:
>> 
>>  'Signed-off-by: Bhargava Shastry '
>> 
>> I'm not sure if we should amend checkpatch to recognize this kind of
>> construct.  It isn't an email address.
>> 
>> Is this a mistake on your end, or is some mail filter scrambling it?
>> 
>> Then again, maybe we should adopt Postel's law and allow for ' at ' to
>> also be an email divider for signoff lines?
>> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v4] conntrack: Add rcu support.

2018-09-28 Thread Darrell Ball


On 9/28/18, 5:03 AM, "Aaron Conole"  wrote:

Darrell Ball  writes:

> Below is the recent private commit message I have for the patch:
>
> However, I plan to split out to 3 main patches, with probably a few minor
> cosmetic patches.
>
> 1/ cmap performance and code simplification.
> 2/ Eliminating exporting private datastructures to outside modules
> 3/ Eliminating passing 'ct' structure as function parameter to most 
functions
>
> Thanks Darrell

How different is the 3-patch series?  Is the end result identical to
this patch?  Just wondering whether it's worthwhile to even do any
performance testing with this patch or if it's better to wait for the
3-patch series?

The initial 3-patch (+ some cosmetic patches) series will be similar Aaron.
However, it is always good to have others check it out.

Thanks Darrell





> conntrack: Add rcu support.
> 
> Add in rcu support for conntrack. At the same time, the array of
> hmaps is replaced by a cmap.  Using a single map also simplifies
> the handling of nat and allows the removal of the nat_conn map.
> Hence, there is a net code reduction.
> 
> The main reason for adding rcu is performance considerations with
> multiple threads.
> 
> I'll probably split out into several patches later, as other changes
> were made at the same time. These include: getting rid of the private
> conntrack structure exported outside of conntrack; these fields are
> made static to conntrack now. Also there is less parameter passing,
> since some private variables like hash_basis, which is written once at
> init time is now referenced as a static variable rather then passed
> through to many functions.
> 
> Signed-off-by: Darrell Ball 
>
>
>
>
>
>
>
> On 9/27/18, 11:10 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> Ben Pfaff" 
> wrote:
>
> On Tue, Sep 18, 2018 at 01:22:50PM -0700, Darrell Ball wrote:
> > Add in rcu support for conntrack. At the same time, the array of
> > hmaps is replaced by a cmap.  Using a single map also simplifies
> > the handling of nat and allows the removal of the nat_conn map.
> > There is still some cleanup to do and a few things to check. I'll
> > probably split out into several patches later, as some cosmetic
> > changes were made at the same time, like getting rid of the
> > private conntrack structure contents exported outside of conntrack;
> > these fields are made static to conntrack now.
> > 
> > Signed-off-by: Darrell Ball 
> 
> Hi Darrell, thanks for the patch.
> 
> Can you add a sentence or two to the commit message, please, 
explaining
> the motivation or the goal for the patch?  It's not clear to me 
whether
> this is a performance optimization, or a cleanup, or a fix for a race,
> etc.  This information helps review the patch and it also helps people
> looking through the history tracking down a bug etc.
> ___
> dev mailing list
> d...@openvswitch.org
> 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Cdball%40vmware.com%7C6b69d51051834f5f1f3408d6253a511e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636737329807027518&sdata=CfptLPWp9wPzHk3T7BX1p5G5X7ioMStevV1OnE2%2FsTM%3D&reserved=0
> 
>
> ___
> dev mailing list
> d...@openvswitch.org
> 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Cdball%40vmware.com%7C6b69d51051834f5f1f3408d6253a511e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636737329807027518&sdata=CfptLPWp9wPzHk3T7BX1p5G5X7ioMStevV1OnE2%2FsTM%3D&reserved=0


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Revert "openvswitch: Fix template leak in error cases."

2018-09-28 Thread Flavio Leitner
This reverts commit 90c7afc96cbbd77f44094b5b651261968e97de67.

When the commit was merged, the code used nf_ct_put() to free
the entry, but later on commit 76644232e612 ("openvswitch: Free
tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which
is a more appropriate. Now the original problem is removed.

Then 44d6e2f27328 ("net: Replace NF_CT_ASSERT() with WARN_ON().")
replaced a debug assert with a WARN_ON() which is trigged now.

Signed-off-by: Flavio Leitner 
---
 net/openvswitch/conntrack.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 86a75105af1a..0aeb34c6389d 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1624,10 +1624,6 @@ int ovs_ct_copy_action(struct net *net, const struct 
nlattr *attr,
OVS_NLERR(log, "Failed to allocate conntrack template");
return -ENOMEM;
}
-
-   __set_bit(IPS_CONFIRMED_BIT, &ct_info.ct->status);
-   nf_conntrack_get(&ct_info.ct->ct_general);
-
if (helper) {
err = ovs_ct_add_helper(&ct_info, helper, key, log);
if (err)
@@ -1639,6 +1635,8 @@ int ovs_ct_copy_action(struct net *net, const struct 
nlattr *attr,
if (err)
goto err_free_ct;
 
+   __set_bit(IPS_CONFIRMED_BIT, &ct_info.ct->status);
+   nf_conntrack_get(&ct_info.ct->ct_general);
return 0;
 err_free_ct:
__ovs_ct_free_action(&ct_info);
-- 
2.14.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] openvswitch: load NAT helper

2018-09-28 Thread Flavio Leitner
Load the respective NAT helper module if the flow uses it.

Signed-off-by: Flavio Leitner 
---
 net/openvswitch/conntrack.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 86a75105af1a..7e9a5283e236 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1312,6 +1312,10 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info 
*info, const char *name,
 
rcu_assign_pointer(help->helper, helper);
info->helper = helper;
+
+   if (info->nat)
+   request_module("ip_nat_%s", name);
+
return 0;
 }
 
-- 
2.14.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Fix bug in oss-fuzz options file

2018-09-28 Thread Bhargava Shastry
Hi Aaron,

The mistake is on my part. I am breaking up my email by hand because I
fear bots on the Internet are going to pick up well formatted emails and
send spam :-)

Is my concern well-founded? Do you have any suggestions for fighting
auto-scraping of email addresses?

Regards,
Bhargava

On 09/28/2018 05:20 PM, Aaron Conole wrote:
> bshas...@sect.tu-berlin.de writes:
> 
>> From: Bhargava Shastry 
>>
>> oss-fuzz options file must begin with a [libfuzzer] header.
>> This was missing in the expr_parse_target.options file which this patch 
>> fixes.
>>
>> Signed-off-by: Bhargava Shastry 
>> ---
> 
> Just wanted to let you know that the bot is complaining because of your
> signature line:
> 
>  'Signed-off-by: Bhargava Shastry '
> 
> I'm not sure if we should amend checkpatch to recognize this kind of
> construct.  It isn't an email address.
> 
> Is this a mistake on your end, or is some mail filter scrambling it?
> 
> Then again, maybe we should adopt Postel's law and allow for ' at ' to
> also be an email divider for signoff lines?
> 

-- 
Bhargava Shastry 
Security in Telecommunications
TU Berlin / Telekom Innovation Laboratories
Ernst-Reuter-Platz 7, Sekr TEL 17 / D - 10587 Berlin, Germany
phone: +49 30 8353 58235
Keybase: https://keybase.io/bshastry
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Makes stylistic change suggested by 0-day robot.

2018-09-28 Thread Bhargava Shastry
Hi Ben,

> 25 seconds would be really long.  I think you could reasonably lower
> it to just 1 or 2 seconds.

Somehow, even the 25 second timeout is being exceeded with non-trivial
frequency. I'm afraid that if we lower it to 1-2 seconds, oss-fuzz is
going to file a lot of bugs. I am trying out the tests locally, and the
fuzzer reports timeouts of 15-20 seconds fairly quickly.

> Sometimes the inputs can get large in practice, because they are often
> machine-generated.

OK. Would imposing a limit of 4096 bytes be sufficient to handle the
machine-generated use-case?

Thanks,
Bhargava
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Makes stylistic change suggested by 0-day robot.

2018-09-28 Thread Ben Pfaff
On Fri, Sep 28, 2018 at 02:00:39PM +0200, Bhargava Shastry wrote:
> 3. Does it make sense to impose a time out for expr_parse_target? The
> default time out used by oss-fuzz fuzzing engines is 25 seconds i.e., a
> bug is automatically filed for inputs that take longer than 25 seconds
> to be processed by the fuzzer target. Is 25 seconds okay? Should it be
> lower? If yes, I can send another patch that configures the time out in
> the respective fuzzer.options file.

25 seconds would be really long.  I think you could reasonably lower it
to just 1 or 2 seconds.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Makes stylistic change suggested by 0-day robot.

2018-09-28 Thread Ben Pfaff
On Fri, Sep 28, 2018 at 01:39:44PM +0200, Bhargava Shastry wrote:
> 1. Does it make sense to move the fuzzer build script to Openvswitch
> repo in the long run?

If you'd be happy with it there, I'm OK with moving it.

> 2. Does it make sense to impose a size limit on inputs to the
> expr_parse_fuzzer target.

Sometimes the inputs can get large in practice, because they are often
machine-generated.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-09-28 Thread Pravin Shelar
On Wed, Sep 26, 2018 at 2:58 AM Stefano Brivio  wrote:
>
> Hi Pravin,
>
> On Wed, 15 Aug 2018 00:19:39 -0700
> Pravin Shelar  wrote:
>
> > I understand fairness has cost, but we need to find right balance
> > between performance and fairness. Current fairness scheme is a
> > lockless algorithm without much computational overhead, did you try to
> > improve current algorithm so that it uses less number of ports.
>
> In the end, we tried harder as you suggested, and found out a way to
> avoid the need for per-thread sets of per-vport sockets: with a few
> changes, a process-wide set of per-vport sockets is actually enough to
> maintain the current fairness behaviour.
>
> Further, we can get rid of duplicate fd events if the EPOLLEXCLUSIVE
> epoll() flag is available, which improves performance noticeably. This
> is explained in more detail in the commit message of the ovs-vswitchd
> patch Matteo wrote [1], now merged.
>
> This approach solves the biggest issue (i.e. huge amount of netlink
> sockets) in ovs-vswitchd itself, without the need for kernel changes.
>

Thanks for working on this issue.

> It doesn't address some proposed improvements though, that is, it does
> nothing to improve the current fairness scheme, it won't allow neither
> the configurable fairness criteria proposed by Ben, nor usage of BPF
> maps for extensibility as suggested by William.
>
> I would however say that those topics bear definitely lower priority,
> and perhaps addressing them at all becomes overkill now that we don't
> any longer have a serious issue.
>
> [1] https://patchwork.ozlabs.org/patch/974304/
Nice!

Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] dp-packet: Handle multi-seg mubfs in shift() func.

2018-09-28 Thread 0-day Robot
Bleep bloop.  Greetings Tiago Lam, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#40 FILE: lib/dp-packet.c:306:
 * XXX: This function is the counterpart of the `rte_pktmbuf_read()` function

Lines checked: 172, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 14/14] dpdk-tests: End-to-end tests for multi-seg mbufs.

2018-09-28 Thread Tiago Lam
The following tests are added to the DPDK testsuite to add some
coverage for the multi-segment mbufs:
- Check that multi-segment mbufs are disabled by default;
- Check that providing `other_config:dpdk-multi-seg-mbufs=true` indeed
  enables mbufs;
- Using a DPDK port, send a random packet out and check that `ofctl
  dump-flows` shows the correct amount of packets and bytes sent.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/system-dpdk.at | 65 
 1 file changed, 65 insertions(+)

diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 3d21b01..af8de8c 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -71,3 +71,68 @@ OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel 
module is probably n
 ")
 AT_CLEANUP
 dnl --
+
+AT_SETUP([Jumbo frames - Multi-segment disabled by default])
+OVS_DPDK_START()
+
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [1], [])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment enabled])
+OVS_DPDK_START([dpdk-multi-seg-mbufs=true])
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [], [stdout])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment mbufs Tx])
+OVS_DPDK_PRE_CHECK()
+OVS_DPDK_START([per-port-memory=true dpdk-multi-seg-mbufs=true])
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
+AT_CHECK([ovs-vsctl add-port br10 dpdk0 \
+-- set Interface dpdk0 type=dpdk options:dpdk-devargs=$(cat PCI_ADDR) \
+-- set Interface dpdk0 mtu_request=9000], [], [stdout], [stderr])
+
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+dnl Add flows to send packets out from the 'dpdk0' port
+AT_CHECK([
+ovs-ofctl del-flows br10
+ovs-ofctl add-flow br10 in_port=LOCAL,actions=output:dpdk0
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [], [stdout])
+
+dnl Send packet out, of the 'dpdk0' port
+AT_CHECK([
+ARP_HEADER="09000B0009000A000806000108000604000100010A\
+0100020A02"
+dnl Build a random hex string to append to the ARP_HEADER
+RANDOM_BODY=$(printf '0102030405%.0s' {1..1750})
+dnl 8792B ARP packet
+RANDOM_ARP="$ARP_HEADER$RANDOM_BODY"
+
+ovs-ofctl packet-out br10 "packet=$RANDOM_ARP,action=resubmit:LOCAL"
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [0], [stdout])
+
+dnl Confirm the single packet as been sent with correct size
+AT_CHECK([ovs-ofctl dump-flows br10 | ofctl_strip | grep in_port], [0], [dnl
+ n_packets=1, n_bytes=8792, in_port=LOCAL actions=output:1
+])
+
+dnl Clean up
+OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel module is probably 
not loaded./d
+/Failed to enable flow control/d
+/failed to connect to \/tmp\/dpdkvhostclient0: No such file or directory/d
+/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 13/14] dpdk-tests: Accept other configs in OVS_DPDK_START

2018-09-28 Thread Tiago Lam
As it stands, OVS_DPDK_START() won't allow other configs to be set
before starting the ovs-vswitchd daemon. This is a problem since some
configs, such as the "dpdk-multi-seg-mbufs=true" for enabling the
multi-segment mbufs, need to be set prior to start OvS.

To support other options, OVS_DPDK_START() has been modified to accept
extra configs in the form "$config_name=$config_value". It then uses
ovs-vsctl to set the configs.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/system-dpdk-macros.at | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
index 0762ee0..7c65834 100644
--- a/tests/system-dpdk-macros.at
+++ b/tests/system-dpdk-macros.at
@@ -21,7 +21,7 @@ m4_define([OVS_DPDK_PRE_CHECK],
 ])
 
 
-# OVS_DPDK_START()
+# OVS_DPDK_START([other-conf-args])
 #
 # Create an empty database and start ovsdb-server. Add special configuration
 # dpdk-init to enable DPDK functionality. Start ovs-vswitchd connected to that
@@ -48,6 +48,10 @@ m4_define([OVS_DPDK_START],
AT_CHECK([lscpu], [], [stdout])
AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) 
{printf "1024,"}; print "1024"}' > SOCKET_MEM])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
other_config:dpdk-socket-mem="$(cat SOCKET_MEM)"])
+   dnl Iterate through $other-conf-args list and include them
+   m4_foreach_w(opt, $1, [
+   AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:opt])
+   ])
 
dnl Start ovs-vswitchd.
AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --log-file -vvconn 
-vofproto_dpif -vunixctl], [0], [stdout], [stderr])
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 12/14] dpdk-tests: Add unit-tests for multi-seg mbufs.

2018-09-28 Thread Tiago Lam
In order to create a minimal environment that allows the tests to get
mbufs from an existing mempool, the following approach is taken:
- EAL is initialised (by using the main dpdk_init()) and a (very) small
  mempool is instantiated (mimicking the logic in dpdk_mp_create()).
  This mempool instance is global and used by all the tests;
- Packets are then allocated from the instantiated mempool, and tested
  on, by running some operations on them and manipulating data.

The tests introduced focus on testing DPDK dp_packets (where
source=DPBUF_DPDK), linked with a single or multiple mbufs, across
several operations, such as:
- dp_packet_put();
- dp_packet_shift();
- dp_packet_reserve();
- dp_packet_push_uninit();
- dp_packet_clear();
- dp_packet_equal();
- dp_packet_linear_data();
- And as a consequence of some of these, dp_packet_put_uninit() and
  dp_packet_resize__().

Finally, this has also been integrated with the new DPDK testsuite.
Thus, when running `$sudo make check-dpdk` one will also be running
these tests.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/automake.mk  |  10 +-
 tests/dpdk-packet-mbufs.at |   7 +
 tests/system-dpdk-testsuite.at |   1 +
 tests/test-dpdk-mbufs.c| 620 +
 4 files changed, 637 insertions(+), 1 deletion(-)
 create mode 100644 tests/dpdk-packet-mbufs.at
 create mode 100644 tests/test-dpdk-mbufs.c

diff --git a/tests/automake.mk b/tests/automake.mk
index 97312cf..fca35d9 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -168,7 +168,8 @@ SYSTEM_DPDK_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-dpdk-macros.at \
tests/system-dpdk-testsuite.at \
-   tests/system-dpdk.at
+   tests/system-dpdk.at \
+   tests/dpdk-packet-mbufs.at
 
 check_SCRIPTS += tests/atlocal
 
@@ -425,6 +426,10 @@ tests_ovstest_SOURCES = \
tests/test-vconn.c \
tests/test-aa.c \
tests/test-stopwatch.c
+if DPDK_NETDEV
+tests_ovstest_SOURCES += \
+   tests/test-dpdk-mbufs.c
+endif
 
 if !WIN32
 tests_ovstest_SOURCES += \
@@ -437,6 +442,9 @@ tests_ovstest_SOURCES += \
 endif
 
 tests_ovstest_LDADD = lib/libopenvswitch.la ovn/lib/libovn.la
+if DPDK_NETDEV
+tests_ovstest_LDFLAGS = $(AM_LDFLAGS) $(DPDK_vswitchd_LDFLAGS)
+endif
 
 noinst_PROGRAMS += tests/test-strtok_r
 tests_test_strtok_r_SOURCES = tests/test-strtok_r.c
diff --git a/tests/dpdk-packet-mbufs.at b/tests/dpdk-packet-mbufs.at
new file mode 100644
index 000..f28e4fc
--- /dev/null
+++ b/tests/dpdk-packet-mbufs.at
@@ -0,0 +1,7 @@
+AT_BANNER([OVS-DPDK dp_packet unit tests])
+
+AT_SETUP([OVS-DPDK dp_packet - mbufs allocation])
+AT_KEYWORDS([dp_packet, multi-seg, mbufs])
+AT_CHECK(ovstest test-dpdk-packet, [], [ignore], [ignore])
+
+AT_CLEANUP
diff --git a/tests/system-dpdk-testsuite.at b/tests/system-dpdk-testsuite.at
index 382f09e..f5edf58 100644
--- a/tests/system-dpdk-testsuite.at
+++ b/tests/system-dpdk-testsuite.at
@@ -23,3 +23,4 @@ m4_include([tests/system-common-macros.at])
 m4_include([tests/system-dpdk-macros.at])
 
 m4_include([tests/system-dpdk.at])
+m4_include([tests/dpdk-packet-mbufs.at])
diff --git a/tests/test-dpdk-mbufs.c b/tests/test-dpdk-mbufs.c
new file mode 100644
index 000..50e41ea
--- /dev/null
+++ b/tests/test-dpdk-mbufs.c
@@ -0,0 +1,620 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "dp-packet.h"
+#include "ovstest.h"
+#include "dpdk.h"
+#include "smap.h"
+
+#define N_MBUFS 1024
+#define MBUF_DATA_LEN 2048
+
+static int num_tests = 0;
+
+/* Global var to hold a mempool instance, "test-mp", used in all of the tests
+ * below. This instance is instantiated in dpdk_setup_eal_with_mp(). */
+static struct rte_mempool *mp;
+
+/* Test data used to fill the packets with data. Note that this isn't a string
+ * that repsents a valid packet, by any means. The pattern is generated in set_
+ * testing_pattern_str() and the sole purpose is to verify the data remains the
+ * same after inserting and operating on multi-segment mbufs. */
+static char *test_str;
+
+/* Asserts a dp_packet that holds a single mbuf, where:
+ * - nb_segs must be 1;
+ * - pkt_len must be equal to data_len which in turn must equal the provided
+ *   'pkt_len';
+ * - data_off must start at the provided 'data_ofs';
+ * - next must be NULL. *

[ovs-dev] [PATCH v10 11/14] netdev-dpdk: support multi-segment jumbo frames

2018-09-28 Thread Tiago Lam
From: Mark Kavanagh 

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 Documentation/topics/dpdk/jumbo-frames.rst | 73 ++
 Documentation/topics/dpdk/memory.rst   | 36 +++
 NEWS   |  1 +
 lib/dpdk.c |  8 
 lib/netdev-dpdk.c  | 60 
 lib/netdev-dpdk.h  |  1 +
 vswitchd/vswitch.xml   | 22 +
 7 files changed, 193 insertions(+), 8 deletions(-)

diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
b/Documentation/topics/dpdk/jumbo-frames.rst
index 00360b4..5888f1e 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,76 @@ Jumbo frame support has been validated against 9728B frames, 
which is the
 largest frame size supported by Fortville NIC using the DPDK i40e driver, but
 larger frames and other DPDK NIC drivers may be supported. These cases are
 common for use cases involving East-West traffic only.
+
+---
+Multi-segment mbufs
+---
+
+Instead of increasing the size of mbufs within a mempool, such that each mbuf
+within the pool is large enough to contain an entire jumbo frame of a
+user-defined size, mbufs can be chained together instead. In this approach each
+mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
+irrespective of the user-requested MTU value. Since each mbuf in the chain is
+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the maximum packet
+length may be hard to guess. For example, in cases where packets originate from
+sources marked for oflload (such as TSO), each packet may be larger than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and the user
+must decide on which approach to adopt on initialisation. If multi-segment
+mbufs is to be enabled, it can be done so with the following command::
+
+$ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
+
+Single-segment mbufs still remain the default when using OvS-DPDK, and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
+multi-segment mbufs are to be used.
+
+~
+Performance notes
+~
+
+When using multi-segment mbufs some PMDs may not support vectorized Tx
+functions, due to its non-contiguous nature. As a result this can hit
+performance for smaller packet sizes. For example, on a setup sending 64B
+packets at line rate, a decrease of ~20% has been observed. The performance
+impact stops being noticeable for larger packet sizes, although the exact size
+will between PMDs, and depending on the architecture one's using.
+
+Tests performed with the i40e PMD driver only showed this limitation for 64B
+packets, and the same rate was observed when comparing multi-segment mbufs and
+single-segment mbuf for 128B packets. In other words, the 20% drop in
+performance was not observed for packets >= 128B during this test case.
+
+Because of this, multi-segment mbufs is not advised to be used with smaller
+packet sizes, suc

[ovs-dev] [PATCH v10 09/14] dp-packet: Add support for data "linearization".

2018-09-28 Thread Tiago Lam
Previous commits have added support to the dp_packet API to handle
multi-segmented packets, where data is not stored contiguously in
memory. However, in some cases, it is inevitable and data must be
provided contiguously. Examples of such cases are when performing csums
over the entire packet data, or when write()'ing to a file descriptor
(for a tap interface, for example). For such cases, the dp_packet API
has been extended to provide a way to transform a multi-segmented
DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
copy of memory). If the packet's data is already stored in memory
contigously then there's no need to convert the packet.

Thus, the main use cases that were assuming that a dp_packet's data is
always held contiguously in memory were changed to make use of the new
"linear functions" in the dp_packet API when there's a need to traverse
the entire's packet data. Per the example above, when the packet's data
needs to be write() to the tap's file descriptor, or when the conntrack
module needs to verify a packet's checksum, the data is now linearized.

Additionally, the layer functions, such as dp_packet_l3() and variants,
have been modified to check if there's enough data in the packet before
returning a pointer to the data (and callers have been modified
accordingly). This requirement is needed to guarantee that a caller
doesn't read beyond the available memory.

Signed-off-by: Tiago Lam 
---
 lib/bfd.c |   3 +-
 lib/cfm.c |   5 +-
 lib/conntrack-icmp.c  |   4 +-
 lib/conntrack-private.h   |   4 +-
 lib/conntrack-tcp.c   |   6 +-
 lib/conntrack.c   | 109 +
 lib/dp-packet.c   |  18 
 lib/dp-packet.h   | 217 +++---
 lib/dpif-netdev.c |   5 +
 lib/dpif-netlink.c|   5 +
 lib/dpif.c|   9 ++
 lib/flow.c|  29 +++---
 lib/lacp.c|   3 +-
 lib/mcast-snooping.c  |   8 +-
 lib/netdev-bsd.c  |   5 +
 lib/netdev-dummy.c|  13 ++-
 lib/netdev-linux.c|  13 ++-
 lib/netdev-native-tnl.c   |  39 +---
 lib/odp-execute.c |  28 --
 lib/ofp-print.c   |  10 +-
 lib/ovs-lldp.c|   3 +-
 lib/packets.c |  81 +---
 lib/pcap-file.c   |   2 +-
 ofproto/ofproto-dpif-upcall.c |  20 +++-
 ofproto/ofproto-dpif-xlate.c  |  42 ++--
 ovn/controller/pinctrl.c  |  29 +++---
 tests/test-conntrack.c|   2 +-
 tests/test-rstp.c |   8 +-
 tests/test-stp.c  |   8 +-
 29 files changed, 483 insertions(+), 245 deletions(-)

diff --git a/lib/bfd.c b/lib/bfd.c
index 5308262..d50d2da 100644
--- a/lib/bfd.c
+++ b/lib/bfd.c
@@ -722,7 +722,8 @@ bfd_process_packet(struct bfd *bfd, const struct flow *flow,
 if (!msg) {
 VLOG_INFO_RL(&rl, "%s: Received too-short BFD control message (only "
  "%"PRIdPTR" bytes long, at least %d required).",
- bfd->name, (uint8_t *) dp_packet_tail(p) - l7,
+ bfd->name, dp_packet_size(p) -
+ (l7 - (uint8_t *) dp_packet_data(p)),
  BFD_PACKET_LEN);
 goto out;
 }
diff --git a/lib/cfm.c b/lib/cfm.c
index 71d2c02..83baf2a 100644
--- a/lib/cfm.c
+++ b/lib/cfm.c
@@ -584,7 +584,7 @@ cfm_compose_ccm(struct cfm *cfm, struct dp_packet *packet,
 
 atomic_read_relaxed(&cfm->extended, &extended);
 
-ccm = dp_packet_l3(packet);
+ccm = dp_packet_l3(packet, sizeof(*ccm));
 ccm->mdlevel_version = 0;
 ccm->opcode = CCM_OPCODE;
 ccm->tlv_offset = 70;
@@ -759,8 +759,7 @@ cfm_process_heartbeat(struct cfm *cfm, const struct 
dp_packet *p)
 atomic_read_relaxed(&cfm->extended, &extended);
 
 eth = dp_packet_eth(p);
-ccm = dp_packet_at(p, (uint8_t *)dp_packet_l3(p) - (uint8_t 
*)dp_packet_data(p),
-CCM_ACCEPT_LEN);
+ccm = dp_packet_l3(p, CCM_ACCEPT_LEN);
 
 if (!ccm) {
 VLOG_INFO_RL(&rl, "%s: Received an unparseable 802.1ag CCM heartbeat.",
diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c
index 40fd1d8..0575d0e 100644
--- a/lib/conntrack-icmp.c
+++ b/lib/conntrack-icmp.c
@@ -63,7 +63,7 @@ icmp_conn_update(struct conn *conn_, struct conntrack_bucket 
*ctb,
 static bool
 icmp4_valid_new(struct dp_packet *pkt)
 {
-struct icmp_header *icmp = dp_packet_l4(pkt);
+struct icmp_header *icmp = dp_packet_l4(pkt, sizeof *icmp);
 
 return icmp->icmp_type == ICMP4_ECHO_REQUEST
|| icmp->icmp_type == ICMP4_INFOREQUEST
@@ -73,7 +73,7 @@ icmp4_valid_new(struct dp_packet *pkt)
 static bool
 icmp6_valid_new(struct dp_packet *pkt)
 {
-struct icmp6_header *icmp6 = dp_packet_l4(pkt);
+struct icmp6_header *icmp6 = dp_packet_l4(pkt, sizeof *icmp6);
 
 return icmp6->icmp6_type == ICMP6_ECHO_REQUEST;
 }
d

[ovs-dev] [PATCH v10 10/14] netdev-dpdk: copy large packet to multi-seg. mbufs

2018-09-28 Thread Tiago Lam
From: Mark Kavanagh 

Currently, packets are only copied to a single segment in the function
dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames,
particularly when multi-segment mbufs are involved.

This patch calculates the number of segments needed by a packet and
copies the data to each segment.

A new function, dpdk_buf_alloc(), has also been introduced as a wrapper
around the nonpmd_mp_mutex to serialise allocations from a non-pmd
context.

Co-authored-by: Michael Qiu 
Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Michael Qiu 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c | 91 +--
 1 file changed, 82 insertions(+), 9 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8484239..e58e7ac 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -552,6 +552,27 @@ dpdk_rte_mzalloc(size_t sz)
 return rte_zmalloc(OVS_VPORT_DPDK, sz, OVS_CACHE_LINE_SIZE);
 }
 
+static struct rte_mbuf *
+dpdk_buf_alloc(struct rte_mempool *mp)
+{
+struct rte_mbuf *mbuf = NULL;
+
+/* If non-pmd we need to lock on nonpmd_mp_mutex mutex */
+if (!dpdk_thread_is_pmd()) {
+ovs_mutex_lock(&nonpmd_mp_mutex);
+
+mbuf = rte_pktmbuf_alloc(mp);
+
+ovs_mutex_unlock(&nonpmd_mp_mutex);
+
+return mbuf;
+}
+
+mbuf = rte_pktmbuf_alloc(mp);
+
+return mbuf;
+}
+
 void
 free_dpdk_buf(struct dp_packet *packet)
 {
@@ -2316,6 +2337,56 @@ out:
 }
 }
 
+static int
+dpdk_copy_dp_packet_to_mbuf(struct dp_packet *packet, struct rte_mbuf **head,
+struct rte_mempool *mp)
+{
+struct rte_mbuf *mbuf, *fmbuf;
+uint16_t max_data_len;
+uint32_t nb_segs = 0;
+uint32_t size = 0;
+
+/* We will need the whole data for copying below */
+if (!dp_packet_is_linear(packet)) {
+dp_packet_linearize(packet);
+}
+
+/* Allocate first mbuf to know the size of data available */
+fmbuf = mbuf = *head = dpdk_buf_alloc(mp);
+if (OVS_UNLIKELY(!mbuf)) {
+return ENOMEM;
+}
+
+size = dp_packet_size(packet);
+
+/* All new allocated mbuf's max data len is the same */
+max_data_len = mbuf->buf_len - mbuf->data_off;
+
+/* Calculate # of output mbufs. */
+nb_segs = size / max_data_len;
+if (size % max_data_len) {
+nb_segs = nb_segs + 1;
+}
+
+/* Allocate additional mbufs, less the one alredy allocated above */
+for (int i = 1; i < nb_segs; i++) {
+mbuf->next = dpdk_buf_alloc(mp);
+if (!mbuf->next) {
+free_dpdk_buf(CONTAINER_OF(fmbuf, struct dp_packet, mbuf));
+fmbuf = NULL;
+return ENOMEM;
+}
+mbuf = mbuf->next;
+}
+
+fmbuf->nb_segs = nb_segs;
+fmbuf->pkt_len = size;
+
+dp_packet_mbuf_write(fmbuf, 0, size, dp_packet_data(packet));
+
+return 0;
+}
+
 /* Tx function. Transmit packets indefinitely */
 static void
 dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
@@ -2332,6 +2403,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
 uint32_t cnt = batch_cnt;
 uint32_t dropped = 0;
+uint32_t i;
 
 if (dev->type != DPDK_DEV_VHOST) {
 /* Check if QoS has been configured for this netdev. */
@@ -2342,28 +2414,29 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 
 uint32_t txcnt = 0;
 
-for (uint32_t i = 0; i < cnt; i++) {
+for (i = 0; i < cnt; i++) {
 struct dp_packet *packet = batch->packets[i];
 uint32_t size = dp_packet_size(packet);
+int err = 0;
 
 if (OVS_UNLIKELY(size > dev->max_packet_len)) {
 VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
  size, dev->max_packet_len);
-
 dropped++;
 continue;
 }
 
-pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
-if (OVS_UNLIKELY(!pkts[txcnt])) {
+err = dpdk_copy_dp_packet_to_mbuf(packet, &pkts[txcnt],
+  dev->dpdk_mp->mp);
+if (err != 0) {
+if (err == ENOMEM) {
+VLOG_ERR_RL(&rl, "Failed to alloc mbufs! %u packets dropped",
+cnt - i);
+}
+
 dropped += cnt - i;
 break;
 }
-
-/* We have to do a copy for now */
-memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
-   dp_packet_data(packet), size);
-dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
 dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
 
 txcnt++;
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 08/14] dp-packet: copy data from multi-seg. DPDK mbuf

2018-09-28 Thread Tiago Lam
From: Michael Qiu 

When doing packet clone, if packet source is from DPDK driver,
multi-segment must be considered, and copy the segment's data one by
one.

Also, lots of DPDK mbuf's info is missed during a copy, like packet
type, ol_flags, etc.  That information is very important for DPDK to do
packets processing.

Co-authored-by: Mark Kavanagh 
Co-authored-by: Tiago Lam 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c   | 69 ++-
 lib/dp-packet.h   |  3 +++
 lib/netdev-dpdk.c |  1 +
 3 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 167bf43..806640b 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -48,6 +48,22 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t 
allocated,
 dp_packet_set_size(b, 0);
 }
 
+#ifdef DPDK_NETDEV
+void
+dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src)
+{
+ovs_assert(dst != NULL && src != NULL);
+struct rte_mbuf *buf_dst = &(dst->mbuf);
+struct rte_mbuf buf_src = src->mbuf;
+
+buf_dst->ol_flags = buf_src.ol_flags;
+buf_dst->packet_type = buf_src.packet_type;
+buf_dst->tx_offload = buf_src.tx_offload;
+}
+#else
+#define dp_packet_copy_mbuf_flags(arg1, arg2)
+#endif
+
 /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
  * memory starting at 'base'.  'base' should be the first byte of a region
  * obtained from malloc().  It will be freed (with free()) if 'b' is resized or
@@ -158,6 +174,44 @@ dp_packet_clone(const struct dp_packet *buffer)
 return dp_packet_clone_with_headroom(buffer, 0);
 }
 
+#ifdef DPDK_NETDEV
+struct dp_packet *
+dp_packet_clone_with_headroom(const struct dp_packet *b, size_t headroom) {
+struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(b);
+
+/* copy multi-seg data */
+if (b->source == DPBUF_DPDK && !rte_pktmbuf_is_contiguous(&b->mbuf)) {
+void *dst = NULL;
+struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+new_buffer = dp_packet_new_with_headroom(pkt_len, headroom);
+dst = dp_packet_data(new_buffer);
+dp_packet_set_size(new_buffer, pkt_len);
+
+if (!rte_pktmbuf_read(mbuf, 0, pkt_len, dst)) {
+return NULL;
+}
+} else {
+new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(b),
+dp_packet_size(b),
+headroom);
+}
+
+/* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+memcpy(&new_buffer->l2_pad_size, &b->l2_pad_size,
+   sizeof(struct dp_packet) -
+   offsetof(struct dp_packet, l2_pad_size));
+
+dp_packet_copy_mbuf_flags(new_buffer, b);
+if (dp_packet_rss_valid(new_buffer)) {
+new_buffer->mbuf.hash.rss = b->mbuf.hash.rss;
+}
+
+return new_buffer;
+}
+#else
 /* Creates and returns a new dp_packet whose data are copied from 'buffer'.
  * The returned dp_packet will additionally have 'headroom' bytes of
  * headroom. */
@@ -165,32 +219,25 @@ struct dp_packet *
 dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom)
 {
 struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(buffer);
 
 new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
- dp_packet_size(buffer),
- headroom);
+ pkt_len, headroom);
+
 /* Copy the following fields into the returned buffer: l2_pad_size,
  * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
 memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size,
 sizeof(struct dp_packet) -
 offsetof(struct dp_packet, l2_pad_size));
 
-#ifdef DPDK_NETDEV
-new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
-#else
 new_buffer->rss_hash_valid = buffer->rss_hash_valid;
-#endif
-
 if (dp_packet_rss_valid(new_buffer)) {
-#ifdef DPDK_NETDEV
-new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
-#else
 new_buffer->rss_hash = buffer->rss_hash;
-#endif
 }
 
 return new_buffer;
 }
+#endif
 
 /* Creates and returns a new dp_packet that initially contains a copy of the
  * 'size' bytes of data starting at 'data' with no headroom or tailroom. */
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index aab8f62..cbf002c 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -124,6 +124,9 @@ void dp_packet_init_dpdk(struct dp_packet *);
 void dp_packet_init(struct dp_packet *, size_t);
 void dp_packet_uninit(struct dp_packet *);
 
+void dp_packet_copy_mbuf_flags(struct dp_packet *dst,
+   const struct

[ovs-dev] [PATCH v10 07/14] dp-packet: Handle multi-seg mubfs in shift() func.

2018-09-28 Thread Tiago Lam
In its current implementation dp_packet_shift() is also unaware of
multi-seg mbufs (that holds data in memory non-contiguously) and assumes
that data exists contiguously in memory, memmove'ing data to perform the
shift.

To add support for multi-seg mbuds a new set of functions was
introduced, dp_packet_mbuf_shift() and dp_packet_mbuf_write(). These
functions are used by dp_packet_shift(), when handling multi-seg mbufs,
to shift and write data within a chain of mbufs.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c | 100 
 lib/dp-packet.h |  10 ++
 2 files changed, 110 insertions(+)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 2aaeaae..167bf43 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -294,6 +294,100 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t 
size)
 }
 }
 
+#ifdef DPDK_NETDEV
+/* Write len data bytes in a mbuf at specified offset.
+ *
+ * 'mbuf', pointer to the destination mbuf where 'ofs' is, and the mbuf where
+ * the data will first be written.
+ * 'ofs', the offset within the provided 'mbuf' where 'data' is to be written.
+ * 'len', the size of the to be written 'data'.
+ * 'data', pointer to the to be written bytes.
+ *
+ * XXX: This function is the counterpart of the `rte_pktmbuf_read()` function
+ * available with DPDK, in the rte_mbuf.h */
+void
+dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, uint32_t len,
+ const void *data)
+{
+char *dst_addr;
+uint16_t data_len;
+int len_copy;
+while (mbuf) {
+if (len == 0) {
+break;
+}
+
+dst_addr = rte_pktmbuf_mtod_offset(mbuf, char *, ofs);
+data_len = MBUF_BUF_END(mbuf->buf_addr, mbuf->buf_len) - dst_addr;
+
+len_copy = MIN(len, data_len);
+/* We don't know if 'data' is the result of a rte_pktmbuf_read() call,
+ * in which case we may end up writing to the same region of memory we
+ * are reading from and overlapping. Hence the use of memmove() here */
+memmove(dst_addr, data, len_copy);
+
+data = ((char *) data) + len_copy;
+len -= len_copy;
+ofs = 0;
+
+mbuf->data_len = len_copy;
+mbuf = mbuf->next;
+}
+}
+
+static void
+dp_packet_mbuf_shift_(struct rte_mbuf *dbuf, int16_t dst_ofs,
+  const struct rte_mbuf *sbuf, uint16_t src_ofs, int len)
+{
+char *rd = xmalloc(sizeof(*rd) * len);
+const char *wd = rte_pktmbuf_read(sbuf, src_ofs, len, rd);
+
+ovs_assert(wd);
+
+dp_packet_mbuf_write(dbuf, dst_ofs, len, wd);
+
+free(rd);
+}
+
+/* Similarly to dp_packet_shift(), shifts the data within the mbufs of a
+ * dp_packet of DPBUF_DPDK source by 'delta' bytes.
+ * Caller must make sure of the following conditions:
+ * - When shifting left, delta can't be bigger than the data_len available in
+ *   the last mbuf;
+ * - When shifting right, delta can't be bigger than the space available in the
+ *   first mbuf (buf_len - data_off).
+ * Both these conditions guarantee that a shift operation doesn't fall outside
+ * the bounds of the existing mbufs, so that the first and last mbufs (when
+ * using multi-segment mbufs), remain the same. */
+static void
+dp_packet_mbuf_shift(struct dp_packet *b, int delta)
+{
+uint16_t src_ofs;
+int16_t dst_ofs;
+
+struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+struct rte_mbuf *tmbuf = rte_pktmbuf_lastseg(mbuf);
+
+if (delta < 0) {
+ovs_assert(-delta <= tmbuf->data_len);
+} else {
+ovs_assert(delta < (mbuf->buf_len - mbuf->data_off));
+}
+
+/* Set the destination and source offsets to copy to */
+dst_ofs = delta;
+src_ofs = 0;
+
+/* Shift data from src mbuf and offset to dst mbuf and offset */
+dp_packet_mbuf_shift_(mbuf, dst_ofs, mbuf, src_ofs,
+  rte_pktmbuf_pkt_len(mbuf));
+
+/* Update mbufs' properties, and if using multi-segment mbufs, first and
+ * last mbuf's data_len also needs to be adjusted */
+mbuf->data_off = mbuf->data_off + dst_ofs;
+}
+#endif
+
 /* Shifts all of the data within the allocated space in 'b' by 'delta' bytes.
  * For example, a 'delta' of 1 would cause each byte of data to move one byte
  * forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each
@@ -306,6 +400,12 @@ dp_packet_shift(struct dp_packet *b, int delta)
: true);
 
 if (delta != 0) {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+dp_packet_mbuf_shift(b, delta);
+return;
+}
+#endif
 char *dst = (char *) dp_packet_data(b) + delta;
 memmove(dst, dp_packet_data(b), dp_packet_size(b));
 dp_packet_set_data(b, dst);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 259b470..aab8f62 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -80,6 +80,11 @@ struct dp_packet {
 };
 };
 
+#ifdef DPDK_NET

[ovs-dev] [PATCH v10 06/14] dp-packet: Handle multi-seg mbufs in helper funcs.

2018-09-28 Thread Tiago Lam
Most helper functions in dp-packet assume that the data held by a
dp_packet is contiguous, and perform operations such as pointer
arithmetic under that assumption. However, with the introduction of
multi-segment mbufs, where data is non-contiguous, such assumptions are
no longer possible. Some examples of Such helper functions are
dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
dp_packet_get_allocated() and dp_packet_at().

Thus, instead of assuming contiguous data in dp_packet, they  now
iterate over the (non-contiguous) data in mbufs to perform their
calculations.

Finally, dp_packet_use__() has also been modified to perform the
initialisation of the packet (and setting the source) before continuing
to set its size and data length, which now depends on the type of
packet.

Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c |   4 +-
 lib/dp-packet.h | 162 ++--
 2 files changed, 148 insertions(+), 18 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 782e7c2..2aaeaae 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -41,11 +41,11 @@ static void
 dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
  enum dp_packet_source source)
 {
+dp_packet_init__(b, allocated, source);
+
 dp_packet_set_base(b, base);
 dp_packet_set_data(b, base);
 dp_packet_set_size(b, 0);
-
-dp_packet_init__(b, allocated, source);
 }
 
 /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 4545041..259b470 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -185,9 +185,25 @@ dp_packet_delete(struct dp_packet *b)
 static inline void *
 dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
 {
-return offset + size <= dp_packet_size(b)
-   ? (char *) dp_packet_data(b) + offset
-   : NULL;
+if (offset + size > dp_packet_size(b)) {
+return NULL;
+}
+
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+while (buf && offset > buf->data_len) {
+offset -= buf->data_len;
+
+buf = buf->next;
+}
+
+return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL;
+}
+#endif
+
+return (char *) dp_packet_data(b) + offset;
 }
 
 /* Returns a pointer to byte 'offset' in 'b', which must contain at least
@@ -196,13 +212,23 @@ static inline void *
 dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
 {
 ovs_assert(offset + size <= dp_packet_size(b));
-return ((char *) dp_packet_data(b)) + offset;
+return dp_packet_at(b, offset, size);
 }
 
 /* Returns a pointer to byte following the last byte of data in use in 'b'. */
 static inline void *
 dp_packet_tail(const struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+/* Find last segment where data ends, meaning the tail of the chained
+ *  mbufs must be there */
+buf = rte_pktmbuf_lastseg(buf);
+
+return rte_pktmbuf_mtod_offset(buf, void *, buf->data_len);
+}
+#endif
 return (char *) dp_packet_data(b) + dp_packet_size(b);
 }
 
@@ -211,6 +237,15 @@ dp_packet_tail(const struct dp_packet *b)
 static inline void *
 dp_packet_end(const struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf));
+
+buf = rte_pktmbuf_lastseg(buf);
+
+return (char *) buf->buf_addr + buf->buf_len;
+}
+#endif
 return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
 }
 
@@ -236,6 +271,15 @@ dp_packet_tailroom(const struct dp_packet *b)
 static inline void
 dp_packet_clear(struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+/* sets pkt_len and data_len to zero and frees unused mbufs */
+dp_packet_set_size(b, 0);
+rte_pktmbuf_reset(&b->mbuf);
+
+return;
+}
+#endif
 dp_packet_set_data(b, dp_packet_base(b));
 dp_packet_set_size(b, 0);
 }
@@ -248,28 +292,47 @@ dp_packet_pull(struct dp_packet *b, size_t size)
 void *data = dp_packet_data(b);
 ovs_assert(dp_packet_size(b) - dp_packet_l2_pad_size(b) >= size);
 dp_packet_set_data(b, (char *) dp_packet_data(b) + size);
-dp_packet_set_size(b, dp_packet_size(b) - size);
+#ifdef DPDK_NETDEV
+b->mbuf.pkt_len -= size;
+#else
+b->size_ -= size;
+#endif
+
 return data;
 }
 
+#ifdef DPDK_NETDEV
+/* Similar to dp_packet_try_pull() but doesn't actually pull any data, only
+ * checks if it could and returns true or false accordingly.
+ *
+ * Valid for dp_packets carrying mbufs only. */
+static inline bool
+dp_packet_mbuf_may_pull(const str

[ovs-dev] [PATCH v10 05/14] dp-packet: Fix data_len handling multi-seg mbufs.

2018-09-28 Thread Tiago Lam
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.

To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.

__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.

Co-authored-by: Michael Qiu 
Co-authored-by: Mark Kavanagh 
Co-authored-by: Przemyslaw Lal 
Co-authored-by: Marcin Ksiadz 
Co-authored-by: Yuanhan Liu 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Przemyslaw Lal 
Signed-off-by: Marcin Ksiadz 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.h | 84 -
 1 file changed, 72 insertions(+), 12 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 6376039..4545041 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -426,20 +426,60 @@ dp_packet_size(const struct dp_packet *b)
 return b->mbuf.pkt_len;
 }
 
+/* Sets the size of the packet 'b' to 'v'. For non-DPDK packets this only means
+ * setting b->size_, but if used in a DPDK packet it means adjusting the first
+ * mbuf pkt_len and last mbuf data_len, to reflect the real size, which can
+ * lead to free'ing tail mbufs that are no longer used.
+ *
+ * This function should be used for setting the size only, and if there's an
+ * assumption that the tail end of 'b' will be trimmed. For adjustng the head
+ * 'end' of 'b', dp_packet_pull() should be used instead. */
 static inline void
 dp_packet_set_size(struct dp_packet *b, uint32_t v)
 {
-/* netdev-dpdk does not currently support segmentation; consequently, for
- * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
- * be used interchangably.
- *
- * On the datapath, it is expected that the size of packets
- * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
- * loss of accuracy in assigning 'v' to 'data_len'.
- */
-b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
-b->mbuf.pkt_len = v; /* Total length of all segments linked to
-  * this segment. */
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *mbuf = &b->mbuf;
+uint16_t new_len = v;
+uint16_t data_len;
+uint16_t nb_segs = 0;
+uint16_t pkt_len = 0;
+
+/* Trim 'v' length bytes from the end of the chained buffers, freeing
+   any buffers that may be left floating */
+while (mbuf) {
+data_len = MIN(new_len, mbuf->data_len);
+mbuf->data_len = data_len;
+
+if (new_len - data_len <= 0) {
+/* Free the rest of chained mbufs */
+free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
+   mbuf));
+mbuf->next = NULL;
+} else if (!mbuf->next) {
+/* Don't assign more than what we have available */
+mbuf->data_len = MIN(new_len,
+ mbuf->buf_len - mbuf->data_off);
+}
+
+new_len -= data_len;
+nb_segs += 1;
+pkt_len += mbuf->data_len;
+mbuf = mbuf->next;
+}
+
+/* pkt_len != v would effectively mean that pkt_len < than 'v' (as
+ * being bigger is logically impossible). Being < than 'v' would mean
+ * the 'v' provided was bigger than the available room, which is the
+ * responsibility of the caller to make sure there is enough room */
+ovs_assert(pkt_len == v);
+
+b->mbuf.nb_segs = nb_segs;
+b->mbuf.pkt_len = pkt_len;
+} else {
+b->mbuf.data_len = v;
+/* Total length of all segments linked to this segment. */
+b->mbuf.pkt_len = v;
+}
 }
 
 static inline uint16_t
@@ -451,7 +491,27 @@ __packet_data(const struct dp_packet *b)
 static inline void
 __packet_set_data(struct dp_packet *b, uint16_t v)
 {
-b->mbuf.data_off = v;
+if (b->source == DPBUF_DPDK) {
+/* Moving data_off away from the first mbuf in the chain is not a
+ * possibility using DPBUF_DPDK dp_packets */
+ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
+
+uint16

[ovs-dev] [PATCH v10 04/14] netdev-dpdk: Serialise non-pmds mbufs' alloc/free.

2018-09-28 Thread Tiago Lam
A new mutex, 'nonpmd_mp_mutex', has been introduced to serialise
allocation and free operations by non-pmd threads on a given mempool.

free_dpdk_buf() has been modified to make use of the introduced mutex.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d786be3..3b11546 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -322,6 +322,16 @@ static struct ovs_mutex dpdk_mp_mutex 
OVS_ACQ_AFTER(dpdk_mutex)
 static struct ovs_list dpdk_mp_list OVS_GUARDED_BY(dpdk_mp_mutex)
 = OVS_LIST_INITIALIZER(&dpdk_mp_list);
 
+/* This mutex must be used by non pmd threads when allocating or freeing
+ * mbufs through mempools, when outside of the `non_pmd_mutex` mutex, in struct
+ * dp_netdev.
+ * The reason, as pointed out in the "Known Issues" section in DPDK's EAL docs,
+ * is that the implementation on which mempool is based off is non-preemptable.
+ * Since non-pmds may end up not being pinned this could lead to the preemption
+ * between non-pmds performing operations on the same mempool, which could lead
+ * to memory corruption. */
+static struct ovs_mutex nonpmd_mp_mutex = OVS_MUTEX_INITIALIZER;
+
 struct dpdk_mp {
  struct rte_mempool *mp;
  int mtu;
@@ -492,6 +502,8 @@ struct netdev_rxq_dpdk {
 dpdk_port_t port_id;
 };
 
+static bool dpdk_thread_is_pmd(void);
+
 static void netdev_dpdk_destruct(struct netdev *netdev);
 static void netdev_dpdk_vhost_destruct(struct netdev *netdev);
 
@@ -525,6 +537,12 @@ dpdk_buf_size(int mtu)
  NETDEV_DPDK_MBUF_ALIGN);
 }
 
+static bool
+dpdk_thread_is_pmd(void)
+{
+ return rte_lcore_id() != NON_PMD_CORE_ID;
+}
+
 /* Allocates an area of 'sz' bytes from DPDK.  The memory is zero'ed.
  *
  * Unlike xmalloc(), this function can return NULL on failure. */
@@ -535,11 +553,20 @@ dpdk_rte_mzalloc(size_t sz)
 }
 
 void
-free_dpdk_buf(struct dp_packet *p)
+free_dpdk_buf(struct dp_packet *packet)
 {
-struct rte_mbuf *pkt = (struct rte_mbuf *) p;
+/* If non-pmd we need to lock on nonpmd_mp_mutex mutex */
+if (!dpdk_thread_is_pmd()) {
+ovs_mutex_lock(&nonpmd_mp_mutex);
+
+rte_pktmbuf_free(&packet->mbuf);
+
+ovs_mutex_unlock(&nonpmd_mp_mutex);
+
+return;
+}
 
-rte_pktmbuf_free(pkt);
+rte_pktmbuf_free(&packet->mbuf);
 }
 
 static void
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 03/14] dp-packet: Fix allocated size on DPDK init.

2018-09-28 Thread Tiago Lam
When enabled with DPDK OvS deals with two types of packets, the ones
coming from the mempool and the ones locally created by OvS - which are
copied to mempool mbufs before output. In the latter, the space is
allocated from the system, while in the former the mbufs are allocated
from a mempool, which takes care of initialising them appropriately.

In the current implementation, during mempool's initialisation of mbufs,
dp_packet_set_allocated() is called from dp_packet_init_dpdk() without
considering that the allocated space, in the case of multi-segment
mbufs, might be greater than a single mbuf.  Furthermore, given that
dp_packet_init_dpdk() is on the code path that's called upon mempool's
initialisation, a call to dp_packet_set_allocated() is redundant, since
mempool takes care of initialising it.

To fix this, dp_packet_set_allocated() is no longer called after
initialisation of a mempool, only in dp_packet_init__(), which is still
called by OvS when initialising locally created packets.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c   | 3 +--
 lib/dp-packet.h   | 2 +-
 lib/netdev-dpdk.c | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 443c225..782e7c2 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -99,9 +99,8 @@ dp_packet_use_const(struct dp_packet *b, const void *data, 
size_t size)
  * buffer.  Here, non-transient ovs dp-packet fields are initialized for
  * packets that are part of a DPDK memory pool. */
 void
-dp_packet_init_dpdk(struct dp_packet *b, size_t allocated)
+dp_packet_init_dpdk(struct dp_packet *b)
 {
-dp_packet_set_allocated(b, allocated);
 b->source = DPBUF_DPDK;
 }
 
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index b948fe1..6376039 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -114,7 +114,7 @@ void dp_packet_use(struct dp_packet *, void *, size_t);
 void dp_packet_use_stub(struct dp_packet *, void *, size_t);
 void dp_packet_use_const(struct dp_packet *, const void *, size_t);
 
-void dp_packet_init_dpdk(struct dp_packet *, size_t allocated);
+void dp_packet_init_dpdk(struct dp_packet *);
 
 void dp_packet_init(struct dp_packet *, size_t);
 void dp_packet_uninit(struct dp_packet *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1e19622..d786be3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -550,7 +550,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
 {
 struct rte_mbuf *pkt = _p;
 
-dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
+dp_packet_init_dpdk((struct dp_packet *) pkt);
 }
 
 static int
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 02/14] dp-packet: Init specific mbuf fields.

2018-09-28 Thread Tiago Lam
From: Mark Kavanagh 

dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.

This patch ensures that the following mbuf fields are initialized to
appropriate values on creation of a new dp_packet:
   - ol_flags=0
   - nb_segs=1
   - tx_offload=0
   - packet_type=0
   - next=NULL

Adapted from an idea by Michael Qiu :
https://patchwork.ozlabs.org/patch/777570/

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index ba91e58..b948fe1 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -625,14 +625,15 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p 
OVS_UNUSED)
 }
 
 /* This initialization is needed for packets that do not come
- * from DPDK interfaces, when vswitchd is built with --with-dpdk.
- * The DPDK rte library will still otherwise manage the mbuf.
- * We only need to initialize the mbuf ol_flags. */
+ * from DPDK interfaces, when vswitchd is built with --with-dpdk. */
 static inline void
 dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
 {
 #ifdef DPDK_NETDEV
-p->mbuf.ol_flags = 0;
+struct rte_mbuf *mbuf = &(p->mbuf);
+mbuf->ol_flags = mbuf->tx_offload = mbuf->packet_type = 0;
+mbuf->nb_segs = 1;
+mbuf->next = NULL;
 #endif
 }
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 01/14] netdev-dpdk: fix mbuf sizing

2018-09-28 Thread Tiago Lam
From: Mark Kavanagh 

There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
  buffer alignment (typically 1024B). So, for example, in order to
  successfully receive and capture a 1500B packet, mbufs with a
  data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
  * the dp packet, which includes a struct rte mbuf (704B)
  * RTE_PKTMBUF_HEADROOM (128B)
  * packet data (aligned to 1k, as previously described)
  * RTE_PKTMBUF_TAILROOM (typically 0)

Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:

mbuf.data_len == mbuf.buf_len - mbuf.data_off

This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.

Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
CC: Santosh Shukla 
Signed-off-by: Mark Kavanagh 
Acked-by: Santosh Shukla 
Acked-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c | 56 +--
 1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f91aa27..1e19622 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
 #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_MAX_LEN)
 #define FRAME_LEN_TO_MTU(frame_len) ((frame_len)\
  - ETHER_HDR_LEN - ETHER_CRC_LEN)
-#define MBUF_SIZE(mtu)  ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
- + sizeof(struct dp_packet) \
- + RTE_PKTMBUF_HEADROOM),   \
- RTE_CACHE_LINE_SIZE)
 #define NETDEV_DPDK_MBUF_ALIGN  1024
 #define NETDEV_DPDK_MAX_PKT_LEN 9728
 
@@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 char mp_name[RTE_MEMPOOL_NAMESIZE];
 const char *netdev_name = netdev_get_name(&dev->up);
 int socket_id = dev->requested_socket_id;
-uint32_t n_mbufs;
+uint32_t n_mbufs = 0;
+uint32_t mbuf_size = 0;
+uint32_t aligned_mbuf_size = 0;
+uint32_t mbuf_priv_data_len = 0;
+uint32_t pkt_size = 0;
 uint32_t hash = hash_string(netdev_name, 0);
 struct dpdk_mp *dmp = NULL;
 int ret;
@@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 dmp->mtu = mtu;
 dmp->refcount = 1;
 
+/* Get the size of each mbuf, based on the MTU */
+mbuf_size = dpdk_buf_size(dev->requested_mtu);
+
 n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
 
 do {
@@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
  * so this is not an issue for tasks such as debugging.
  */
 ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
-   "ovs%08x%02d%05d%07u",
-   hash, socket_id, mtu, n_mbufs);
+   "ovs%08x%02d%05d%07u",
+hash, socket_id, mtu, n_mbufs);
 if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
 VLOG_DBG("snprintf returned %d. "
  "Failed to generate a mempool name for \"%s\". "
@@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 break;
 }
 
-VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
-  "on socket %d for %d Rx and %d Tx queues.",
-  netdev_name, n_mbufs, socket_id,
-  dev->requested_n_rxq, dev->requested_n_txq);
-
-dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
-  MP_CACHE_SZ,
-  sizeof (struct dp_packet)
-  - sizeof (struct rte_mbuf),
-  MBUF_SIZE(mtu)
-  - sizeof(struct dp_packet),
+VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
+  "on socket %d for %d Rx and %d Tx queues, "
+  "cache line size of %u",
+  netdev_name, n_mbufs, mbuf_

[ovs-dev] [PATCH v10 00/14] Support multi-segment mbufs

2018-09-28 Thread Tiago Lam
Overview

This patchset introduces support for multi-segment mbufs to OvS-DPDK.
Multi-segment mbufs are typically used when the size of an mbuf is
insufficient to contain the entirety of a packet's data. Instead, the
data is split across numerous mbufs, each carrying a portion, or
'segment', of the packet data. Mbufs are chained via their 'next'
attribute (an mbuf pointer).

The main motivation behind the support for multi-segment mbufs is to
later introduce TSO (use case i. below) / GRO in OvS-DPDK, which is
planned to be introduced after this series.

Use Cases
=
i.  Handling oversized (guest-originated) frames, which are marked
for hardware accelration/offload (TSO, for example).

Packets which originate from a non-DPDK source may be marked for
offload; as such, they may be larger than the permitted ingress
interface's MTU, and may be stored in an oversized dp-packet. In
order to transmit such packets over a DPDK port, their contents
must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in
its current implementation, that function only copies data into
a single mbuf; if the space available in the mbuf is exhausted,
but not all packet data has been copied, then it is lost.
Similarly, when cloning a DPDK mbuf, it must be considered
whether that mbuf contains multiple segments. Both issues are
resolved within this patchset.

ii. Handling jumbo frames.

While OvS already supports jumbo frames, it does so by increasing
mbuf size, such that the entirety of a jumbo frame may be handled
in a single mbuf. This is certainly the preferred, and most
performant approach (and remains the default).

Enabling multi-segment mbufs

Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this.

This is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Performance notes (based on v8, 1st non-RFC)
=
In order to test for regressions in performance, tests were run on top
of master 88125d6 and v8 of this patchset, both with the multi-segment
mbufs option enabled and disabled.

VSperf was used to run the phy2phy_cont and pvp_cont tests with varying
packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.

Test | Size | Master | Multi-seg disabled | Multi-seg enabled
-
p2p  |  64  | ~22.7  |  ~22.65|   ~18.3
p2p  | 1500 |  ~1.6  |~1.6|~1.6
p2p  | 7000 | ~0.36  |   ~0.36|   ~0.36
pvp  |  64  |  ~6.7  |~6.7|~6.3
pvp  | 1500 |  ~1.6  |~1.6|~1.6
pvp  | 7000 | ~0.36  |   ~0.36|   ~0.36

Packet size is in bytes, while all packet rates are reported in mpps
(aggregated).

No noticeable regression has been observed (certainly everything is
within the ± 5% margin of existing performance), aside from the 64B
packet size case when multi-segment mbuf is enabled. This is
expected, however, because of how Tx vectoriszed functions are
incompatible with multi-segment mbufs on some PMDs. The PMD under
use during these tests was the i40e (on a Intel X710 NIC), which
indeed doesn't support vectorized Tx functions with multi-segment
mbufs.

---
v10: - Rebase on master 0d5450a ("ovsdb-client: Fix a bug that uses wrong
   index");
 - Address Ilya's comments:
   - Fix dp_packet_reset() to not trim the packet;
   - Remove unused netdev_dpdk_is_multi_segment_mbufs_enabled() function;
   - Modify the dp_packet_l[2_5|3|4] layer functions to check that enough
 data is present in the packet before returning. The callers were also
 modified to act accordingly.
 - Add comment to dp_packet_set_size() to clarify its usage and modify
   dp_packet_pull() to comply with that;
 - Modified slightly the "linearization" approach. Instead of making it an
   implict operation there are now two functions, dp_packet_is_linear() and
   dp_packet_linearize(), that enable the callers to explicity check if a
   packet needs linearization and linearize it if needed;
 - Briefly mention the Userspace Conntrack as one of the "limitations" when
   using multi-segment mbufs.

v9: - Rebase on master e4

Re: [ovs-dev] [PATCH] Fix bug in oss-fuzz options file

2018-09-28 Thread Aaron Conole
bshas...@sect.tu-berlin.de writes:

> From: Bhargava Shastry 
>
> oss-fuzz options file must begin with a [libfuzzer] header.
> This was missing in the expr_parse_target.options file which this patch 
> fixes.
>
> Signed-off-by: Bhargava Shastry 
> ---

Just wanted to let you know that the bot is complaining because of your
signature line:

 'Signed-off-by: Bhargava Shastry '

I'm not sure if we should amend checkpatch to recognize this kind of
construct.  It isn't an email address.

Is this a mistake on your end, or is some mail filter scrambling it?

Then again, maybe we should adopt Postel's law and allow for ' at ' to
also be an email divider for signoff lines?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Fix bug in oss-fuzz options file

2018-09-28 Thread 0-day Robot
Bleep bloop.  Greetings Bhargava Shastry, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Bhargava Shastry  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Bhargava Shastry 
Lines checked: 27, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v4] conntrack: Add rcu support.

2018-09-28 Thread Aaron Conole
Darrell Ball  writes:

> Below is the recent private commit message I have for the patch:
>
> However, I plan to split out to 3 main patches, with probably a few minor
> cosmetic patches.
>
> 1/ cmap performance and code simplification.
> 2/ Eliminating exporting private datastructures to outside modules
> 3/ Eliminating passing 'ct' structure as function parameter to most functions
>
> Thanks Darrell

How different is the 3-patch series?  Is the end result identical to
this patch?  Just wondering whether it's worthwhile to even do any
performance testing with this patch or if it's better to wait for the
3-patch series?

> conntrack: Add rcu support.
> 
> Add in rcu support for conntrack. At the same time, the array of
> hmaps is replaced by a cmap.  Using a single map also simplifies
> the handling of nat and allows the removal of the nat_conn map.
> Hence, there is a net code reduction.
> 
> The main reason for adding rcu is performance considerations with
> multiple threads.
> 
> I'll probably split out into several patches later, as other changes
> were made at the same time. These include: getting rid of the private
> conntrack structure exported outside of conntrack; these fields are
> made static to conntrack now. Also there is less parameter passing,
> since some private variables like hash_basis, which is written once at
> init time is now referenced as a static variable rather then passed
> through to many functions.
> 
> Signed-off-by: Darrell Ball 
>
>
>
>
>
>
>
> On 9/27/18, 11:10 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> Ben Pfaff" 
> wrote:
>
> On Tue, Sep 18, 2018 at 01:22:50PM -0700, Darrell Ball wrote:
> > Add in rcu support for conntrack. At the same time, the array of
> > hmaps is replaced by a cmap.  Using a single map also simplifies
> > the handling of nat and allows the removal of the nat_conn map.
> > There is still some cleanup to do and a few things to check. I'll
> > probably split out into several patches later, as some cosmetic
> > changes were made at the same time, like getting rid of the
> > private conntrack structure contents exported outside of conntrack;
> > these fields are made static to conntrack now.
> > 
> > Signed-off-by: Darrell Ball 
> 
> Hi Darrell, thanks for the patch.
> 
> Can you add a sentence or two to the commit message, please, explaining
> the motivation or the goal for the patch?  It's not clear to me whether
> this is a performance optimization, or a cleanup, or a fix for a race,
> etc.  This information helps review the patch and it also helps people
> looking through the history tracking down a bug etc.
> ___
> dev mailing list
> d...@openvswitch.org
> 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Cdball%40vmware.com%7Ccaf22dd3c47047c8121d08d624a480ed%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636736686314994726&sdata=Uo%2BnJ%2BjXK4BO30alYXcJmmNqX8R8euqITskV3hpJPj4%3D&reserved=0
> 
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Makes stylistic change suggested by 0-day robot.

2018-09-28 Thread Bhargava Shastry
Sorry, I overlooked a third question (addendum to the previous two):

3. Does it make sense to impose a time out for expr_parse_target? The
default time out used by oss-fuzz fuzzing engines is 25 seconds i.e., a
bug is automatically filed for inputs that take longer than 25 seconds
to be processed by the fuzzer target. Is 25 seconds okay? Should it be
lower? If yes, I can send another patch that configures the time out in
the respective fuzzer.options file.

Thanks,
Bhargava

On 09/28/2018 01:39 PM, Bhargava Shastry wrote:
> Hi Ben,
> 
> Please ignore my comment regarding build (under the heading === 1. Build
> ===). There was a bug in my earlier patch which I fixed in my latest
> patch. Please don't rename anything in OvS repo or in the
> ovs-fuzzing-corpus repo. Local tests show everything works as expected.
> 
> However, two questions remain:
> 
> 1. Does it make sense to move the fuzzer build script to Openvswitch
> repo in the long run?
> 
> 2. Does it make sense to impose a size limit on inputs to the
> expr_parse_fuzzer target.
> 
> Thanks,
> Bhargava
> 
> On 09/28/2018 12:42 PM, Bhargava Shastry wrote:
>> Hi Ben,
>>
>> Thanks for taking a close look applying the (edited) patch. I have a
>> couple of comments:
>>
>> === 1. Build ===
>>
>> Currently, all fuzzer test harnesses in OvS are built by this bash
>> script located at Google oss-fuzz repo [1]. The peculiar thing about
>> Google's fuzz infrastructure is that it expects fuzzer configuration and
>> seed corpus files to share the name of the fuzzer target.
>>
>> [1]:
>> https://github.com/google/oss-fuzz/blob/master/projects/openvswitch/build.sh
>>
>> For instance, let's assume that the OvS build system produces a linked
>> fuzzer target called "openvswitch_expr_parse_target." Google's fuzzing
>> infra expects the configuration for this target to be called
>> "openvswitch_expr_parse_target.options" i.e.,
>> .options and the seed corpus to be called
>> "openvswitch_expr_parse_target_seed_corpus.zip" i.e.,
>> _seed_corpus.zip.
>>
>> Now, the build script in the Google oss-fuzz repo (see [1]) takes care
>> of constructing the seed_corpus zip file. However, it expects the corpus
>> folder name to match the fuzzer target. Likewise for the fuzzer
>> configuration file.
>>
>> The problem with the current patch integration is that the OvS build,
>> for some reason that I cannot comprehend (my make knowledge is poor),
>> creates a fuzzer target called "openvswitch_expr_parse_target" i.e., it
>> prefixes the string "openvswitch_" to its namesake C file,
>> expr_parse_target.c. One outcome of this (unexpected) change in naming
>> convention is that the fuzzer configuration options and seed corpora are
>> not picked up by oss-fuzz. For example, one of the fuzzer config options
>> asks the fuzzer to **not** generate inputs greater than 100 characters
>> but as you can see by a recent bug report, the input size to trigger the
>> bug is 8 kB. This brings me to my second concern. But before that, I
>> have a suggestion to address this issue:
>>
>> Short-term: In the short term it suffices to rename the config files and
>> seed corpora according to the linked fuzzer target. This would mean
>> renaming the folder called "expr_parse_seed_corpus" to
>> "openvswitch_expr_parse_seed_corpus" in the openvswitch
>> ovs-fuzzing-corpus repo [2] AND renaming the fuzzer options file
>> (located at 'tests/oss-fuzz/config') "expr_parse_target.options" to
>> "openvswitch_expr_parse_target.options"
>>
>> Long-term: In the long term, it may be wise to move the fuzzer build
>> script from the Google oss-fuzz repo to the Open vSwitch repo and
>> maintain it alongside OvS code. The Google oss-fuzz repo can then
>> contain a bash one liner like so:
>>
>> ```
>> ./tests/oss-fuzz/build.sh
>> ```
>>
>> that invokes OvS fuzzer build script. Perhaps, a README to this effect
>> can be added to aid maintenance.
>>
>> === 2. Input size ===
>>
>> I am not sure I completely understand the input specification of the OVN
>> parser. Should we impose a limit on the number of characters parsed by
>> the lexer/expression parser? The decision to make it 100 characters in
>> my earlier patch is ad-hoc. My reasoning was that these strings are
>> sourced from some sort of human (network administrator) input and hence
>> they are typically short. However, should it be sourced from a config
>> file of some sort, perhaps not imposing a character limit is good to
>> help the fuzzer tease out all sorts of corner cases.
>>
>> So, here's my question. Currently, you can see that the fuzzer options
>> file for the ovn target
>> (tests/oss-fuzz/config/expr_parse_target.options) has a line to this effect:
>>
>> max_len = 100
>>
>> Do you suggest keeping it this way, doing away with it, or increasing it
>> to a higher threshold?
>>
>> Thanks,
>> Bhargava
>>
>> On 09/27/2018 11:27 PM, Ben Pfaff wrote:
>>> On Thu, Sep 27, 2018 at 02:07:42PM +0200, bshas...@sect.tu-berlin.de wrote:
 From: Bhargava Shastry 

>>

Re: [ovs-dev] [PATCH v2 2/2] Makes stylistic change suggested by 0-day robot.

2018-09-28 Thread Bhargava Shastry
Hi Ben,

Please ignore my comment regarding build (under the heading === 1. Build
===). There was a bug in my earlier patch which I fixed in my latest
patch. Please don't rename anything in OvS repo or in the
ovs-fuzzing-corpus repo. Local tests show everything works as expected.

However, two questions remain:

1. Does it make sense to move the fuzzer build script to Openvswitch
repo in the long run?

2. Does it make sense to impose a size limit on inputs to the
expr_parse_fuzzer target.

Thanks,
Bhargava

On 09/28/2018 12:42 PM, Bhargava Shastry wrote:
> Hi Ben,
> 
> Thanks for taking a close look applying the (edited) patch. I have a
> couple of comments:
> 
> === 1. Build ===
> 
> Currently, all fuzzer test harnesses in OvS are built by this bash
> script located at Google oss-fuzz repo [1]. The peculiar thing about
> Google's fuzz infrastructure is that it expects fuzzer configuration and
> seed corpus files to share the name of the fuzzer target.
> 
> [1]:
> https://github.com/google/oss-fuzz/blob/master/projects/openvswitch/build.sh
> 
> For instance, let's assume that the OvS build system produces a linked
> fuzzer target called "openvswitch_expr_parse_target." Google's fuzzing
> infra expects the configuration for this target to be called
> "openvswitch_expr_parse_target.options" i.e.,
> .options and the seed corpus to be called
> "openvswitch_expr_parse_target_seed_corpus.zip" i.e.,
> _seed_corpus.zip.
> 
> Now, the build script in the Google oss-fuzz repo (see [1]) takes care
> of constructing the seed_corpus zip file. However, it expects the corpus
> folder name to match the fuzzer target. Likewise for the fuzzer
> configuration file.
> 
> The problem with the current patch integration is that the OvS build,
> for some reason that I cannot comprehend (my make knowledge is poor),
> creates a fuzzer target called "openvswitch_expr_parse_target" i.e., it
> prefixes the string "openvswitch_" to its namesake C file,
> expr_parse_target.c. One outcome of this (unexpected) change in naming
> convention is that the fuzzer configuration options and seed corpora are
> not picked up by oss-fuzz. For example, one of the fuzzer config options
> asks the fuzzer to **not** generate inputs greater than 100 characters
> but as you can see by a recent bug report, the input size to trigger the
> bug is 8 kB. This brings me to my second concern. But before that, I
> have a suggestion to address this issue:
> 
> Short-term: In the short term it suffices to rename the config files and
> seed corpora according to the linked fuzzer target. This would mean
> renaming the folder called "expr_parse_seed_corpus" to
> "openvswitch_expr_parse_seed_corpus" in the openvswitch
> ovs-fuzzing-corpus repo [2] AND renaming the fuzzer options file
> (located at 'tests/oss-fuzz/config') "expr_parse_target.options" to
> "openvswitch_expr_parse_target.options"
> 
> Long-term: In the long term, it may be wise to move the fuzzer build
> script from the Google oss-fuzz repo to the Open vSwitch repo and
> maintain it alongside OvS code. The Google oss-fuzz repo can then
> contain a bash one liner like so:
> 
> ```
> ./tests/oss-fuzz/build.sh
> ```
> 
> that invokes OvS fuzzer build script. Perhaps, a README to this effect
> can be added to aid maintenance.
> 
> === 2. Input size ===
> 
> I am not sure I completely understand the input specification of the OVN
> parser. Should we impose a limit on the number of characters parsed by
> the lexer/expression parser? The decision to make it 100 characters in
> my earlier patch is ad-hoc. My reasoning was that these strings are
> sourced from some sort of human (network administrator) input and hence
> they are typically short. However, should it be sourced from a config
> file of some sort, perhaps not imposing a character limit is good to
> help the fuzzer tease out all sorts of corner cases.
> 
> So, here's my question. Currently, you can see that the fuzzer options
> file for the ovn target
> (tests/oss-fuzz/config/expr_parse_target.options) has a line to this effect:
> 
> max_len = 100
> 
> Do you suggest keeping it this way, doing away with it, or increasing it
> to a higher threshold?
> 
> Thanks,
> Bhargava
> 
> On 09/27/2018 11:27 PM, Ben Pfaff wrote:
>> On Thu, Sep 27, 2018 at 02:07:42PM +0200, bshas...@sect.tu-berlin.de wrote:
>>> From: Bhargava Shastry 
>>>
>>> Wraps overly long line in expr_parse_target.c
>>>
>>> Signed-off-by: Bhargava Shastry 
>>
>> I folded this into the first patch.  Thanks!
>>
> 

-- 
Bhargava Shastry 
Security in Telecommunications
TU Berlin / Telekom Innovation Laboratories
Ernst-Reuter-Platz 7, Sekr TEL 17 / D - 10587 Berlin, Germany
phone: +49 30 8353 58235
Keybase: https://keybase.io/bshastry
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Fix bug in oss-fuzz options file

2018-09-28 Thread bshastry
From: Bhargava Shastry 

oss-fuzz options file must begin with a [libfuzzer] header.
This was missing in the expr_parse_target.options file which this patch 
fixes.

Signed-off-by: Bhargava Shastry 
---
 tests/oss-fuzz/config/expr_parse_target.options | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/oss-fuzz/config/expr_parse_target.options 
b/tests/oss-fuzz/config/expr_parse_target.options
index c0dfa68f1..4dfedfd84 100644
--- a/tests/oss-fuzz/config/expr_parse_target.options
+++ b/tests/oss-fuzz/config/expr_parse_target.options
@@ -1,3 +1,4 @@
+[libfuzzer]
 dict = expr.dict
 close_fd_mask = 3
 max_len = 100
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/2] dpif-netdev-perf: Print SMC statistics.

2018-09-28 Thread Ilya Maximets
Printing of the SMC hits missed in the 'dpif-netdev/pmd-perf-show'
appctl command.

CC: Yipeng Wang 
Fixes: 60d8ccae135f ("dpif-netdev: Add SMC cache after EMC cache")
Signed-off-by: Ilya Maximets 
Acked-by: Yipeng Wang 
---
 lib/dpif-netdev-perf.c  | 3 +++
 lib/dpif-netdev-perf.h  | 2 +-
 lib/dpif-netdev-unixctl.man | 3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 13f1010c9..92ac38dab 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -206,6 +206,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 "  Rx packets:  %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
 "  Datapath passes: %12"PRIu64"  (%.2f passes/pkt)\n"
 "  - EMC hits:  %12"PRIu64"  (%4.1f %%)\n"
+"  - SMC hits:  %12"PRIu64"  (%4.1f %%)\n"
 "  - Megaflow hits: %12"PRIu64"  (%4.1f %%, %.2f subtbl lookups/"
  "hit)\n"
 "  - Upcalls:   %12"PRIu64"  (%4.1f %%, %.1f us/upcall)\n"
@@ -215,6 +216,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 passes, rx_packets ? 1.0 * passes / rx_packets : 0,
 stats[PMD_STAT_EXACT_HIT],
 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
+stats[PMD_STAT_SMC_HIT],
+100.0 * stats[PMD_STAT_SMC_HIT] / passes,
 stats[PMD_STAT_MASKED_HIT],
 100.0 * stats[PMD_STAT_MASKED_HIT] / passes,
 stats[PMD_STAT_MASKED_HIT]
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 299d52a98..859c05613 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -56,7 +56,7 @@ extern "C" {
 
 enum pmd_stat_type {
 PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
-PMD_STAT_SMC_HIT,/* Packets that had a sig match hit (SMC). */
+PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
 PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
 PMD_STAT_MISS,  /* Packets that did not match and upcall was ok. */
 PMD_STAT_LOST,  /* Packets that did not match and upcall failed. */
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index c46f6b19c..0705f1cfb 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -11,7 +11,7 @@ Shows performance statistics for one or all pmd threads of 
the datapath
 \fIdp\fR. The special thread "main" sums up the statistics of every non pmd
 thread.
 
-The sum of "emc hits", "megaflow hits" and "miss" is the number of
+The sum of "emc hits", "smc hits", "megaflow hits" and "miss" is the number of
 packet lookups performed by the datapath. Beware that a recirculated packet
 experiences one additional lookup per recirculation, so there may be
 more lookups than forwarded packets in the datapath.
@@ -135,6 +135,7 @@ pmd thread numa_id 0 core_id 1:
   Rx packets:   2399607  (2381 Kpps, 848 cycles/pkt)
   Datapath passes:  3599415  (1.50 passes/pkt)
   - EMC hits:336472  ( 9.3 %)
+  - SMC hits: 0  ( 0.0 %)
   - Megaflow hits:  3262943  (90.7 %, 1.00 subtbl lookups/hit)
   - Upcalls:  0  ( 0.0 %, 0.0 us/upcall)
   - Lost upcalls: 0  ( 0.0 %)
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/2] dpif-netdev-unixctl: Change 'masked' to 'megaflow'.

2018-09-28 Thread Ilya Maximets
In the review process of the original patch 'masked hits' stat
was renamed to 'megaflow hits', but the man page wasn't updated.

Fixes: 6553d06bd179 ("dpif-netdev: Add dpif-netdev/pmd-stats-*
  appctl commands.")
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev-unixctl.man | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index ab8619e41..c46f6b19c 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -11,7 +11,7 @@ Shows performance statistics for one or all pmd threads of 
the datapath
 \fIdp\fR. The special thread "main" sums up the statistics of every non pmd
 thread.
 
-The sum of "emc hits", "masked hits" and "miss" is the number of
+The sum of "emc hits", "megaflow hits" and "miss" is the number of
 packet lookups performed by the datapath. Beware that a recirculated packet
 experiences one additional lookup per recirculation, so there may be
 more lookups than forwarded packets in the datapath.
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 0/2] dpif-netdev: Docs and stats fixes.

2018-09-28 Thread Ilya Maximets
Version 2:
* Fixed typo: 'smc hists' --> 'smc hits'.

Ilya Maximets (2):
  dpif-netdev-unixctl: Change 'masked' to 'megaflow'.
  dpif-netdev-perf: Print SMC statistics.

 lib/dpif-netdev-perf.c  | 3 +++
 lib/dpif-netdev-perf.h  | 2 +-
 lib/dpif-netdev-unixctl.man | 3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Makes stylistic change suggested by 0-day robot.

2018-09-28 Thread Bhargava Shastry
Hi Ben,

Thanks for taking a close look applying the (edited) patch. I have a
couple of comments:

=== 1. Build ===

Currently, all fuzzer test harnesses in OvS are built by this bash
script located at Google oss-fuzz repo [1]. The peculiar thing about
Google's fuzz infrastructure is that it expects fuzzer configuration and
seed corpus files to share the name of the fuzzer target.

[1]:
https://github.com/google/oss-fuzz/blob/master/projects/openvswitch/build.sh

For instance, let's assume that the OvS build system produces a linked
fuzzer target called "openvswitch_expr_parse_target." Google's fuzzing
infra expects the configuration for this target to be called
"openvswitch_expr_parse_target.options" i.e.,
.options and the seed corpus to be called
"openvswitch_expr_parse_target_seed_corpus.zip" i.e.,
_seed_corpus.zip.

Now, the build script in the Google oss-fuzz repo (see [1]) takes care
of constructing the seed_corpus zip file. However, it expects the corpus
folder name to match the fuzzer target. Likewise for the fuzzer
configuration file.

The problem with the current patch integration is that the OvS build,
for some reason that I cannot comprehend (my make knowledge is poor),
creates a fuzzer target called "openvswitch_expr_parse_target" i.e., it
prefixes the string "openvswitch_" to its namesake C file,
expr_parse_target.c. One outcome of this (unexpected) change in naming
convention is that the fuzzer configuration options and seed corpora are
not picked up by oss-fuzz. For example, one of the fuzzer config options
asks the fuzzer to **not** generate inputs greater than 100 characters
but as you can see by a recent bug report, the input size to trigger the
bug is 8 kB. This brings me to my second concern. But before that, I
have a suggestion to address this issue:

Short-term: In the short term it suffices to rename the config files and
seed corpora according to the linked fuzzer target. This would mean
renaming the folder called "expr_parse_seed_corpus" to
"openvswitch_expr_parse_seed_corpus" in the openvswitch
ovs-fuzzing-corpus repo [2] AND renaming the fuzzer options file
(located at 'tests/oss-fuzz/config') "expr_parse_target.options" to
"openvswitch_expr_parse_target.options"

Long-term: In the long term, it may be wise to move the fuzzer build
script from the Google oss-fuzz repo to the Open vSwitch repo and
maintain it alongside OvS code. The Google oss-fuzz repo can then
contain a bash one liner like so:

```
./tests/oss-fuzz/build.sh
```

that invokes OvS fuzzer build script. Perhaps, a README to this effect
can be added to aid maintenance.

=== 2. Input size ===

I am not sure I completely understand the input specification of the OVN
parser. Should we impose a limit on the number of characters parsed by
the lexer/expression parser? The decision to make it 100 characters in
my earlier patch is ad-hoc. My reasoning was that these strings are
sourced from some sort of human (network administrator) input and hence
they are typically short. However, should it be sourced from a config
file of some sort, perhaps not imposing a character limit is good to
help the fuzzer tease out all sorts of corner cases.

So, here's my question. Currently, you can see that the fuzzer options
file for the ovn target
(tests/oss-fuzz/config/expr_parse_target.options) has a line to this effect:

max_len = 100

Do you suggest keeping it this way, doing away with it, or increasing it
to a higher threshold?

Thanks,
Bhargava

On 09/27/2018 11:27 PM, Ben Pfaff wrote:
> On Thu, Sep 27, 2018 at 02:07:42PM +0200, bshas...@sect.tu-berlin.de wrote:
>> From: Bhargava Shastry 
>>
>> Wraps overly long line in expr_parse_target.c
>>
>> Signed-off-by: Bhargava Shastry 
> 
> I folded this into the first patch.  Thanks!
> 

-- 
Bhargava Shastry 
Security in Telecommunications
TU Berlin / Telekom Innovation Laboratories
Ernst-Reuter-Platz 7, Sekr TEL 17 / D - 10587 Berlin, Germany
phone: +49 30 8353 58235
Keybase: https://keybase.io/bshastry
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] dpif-netdev-perf: Print SMC statistics.

2018-09-28 Thread Ilya Maximets
On 27.09.2018 19:25, Wang, Yipeng1 wrote:
> Thank you Ilya for adding this!
> 
> Besides the small comment, otherwise
> 
> Acked-by: Yipeng Wang 
> 
>> -Original Message-
>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>> Sent: Wednesday, September 26, 2018 10:19 AM
>> To: ovs-dev@openvswitch.org
>> Cc: Jan Scheurich ; Stokes, Ian 
>> ; Wang, Yipeng1 ;
>> index c46f6b19c..417456db3 100644
>> --- a/lib/dpif-netdev-unixctl.man
>> +++ b/lib/dpif-netdev-unixctl.man
>> @@ -11,7 +11,7 @@ Shows performance statistics for one or all pmd threads of 
>> the datapath
>> \fIdp\fR. The special thread "main" sums up the statistics of every non pmd
>> thread.
>>
>> -The sum of "emc hits", "megaflow hits" and "miss" is the number of
>> +The sum of "emc hits", "smc hists", "megaflow hits" and "miss" is the 
>> number of
> [Wang, Yipeng]  a Typo here, SMC hits.

Thanks. Good catch.

> Maybe we can add a comment that SMC is currently by default turned off. Just 
> in case
> People are wondering why SMC stats is always zero :)

We already have a note in documentation that SMC is turned off by default.
IMHO, there is no need to make additional remarks.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] bundle: add symmetric_l3 hash method for multipath

2018-09-28 Thread Martin Xu
Add a symmetric_l3 hash method that uses both network destination
address and network source address.

VMware-BZ: #2112940

Signed-off-by: Martin Xu 
CC: Ben Pfaff 
---
v1->v2: add eth_type as part of the hash fields; 
added unit tests in multipath, bundle, oss-fuzz, ovs-ofctl

 include/openflow/nicira-ext.h|   4 +-
 lib/bundle.c |   2 +
 lib/flow.c   |  47 +-
 lib/flow.h   |   1 +
 lib/multipath.c  |   2 +
 tests/bundle.at  | 179 +++
 tests/multipath.at   | 271 +++
 tests/oss-fuzz/flow_extract_target.c |   1 +
 tests/ovs-ofctl.at   |  12 ++
 utilities/ovs-ofctl.8.in |   2 +
 10 files changed, 519 insertions(+), 2 deletions(-)

diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h
index 1f5cbbf..dc12101 100644
--- a/include/openflow/nicira-ext.h
+++ b/include/openflow/nicira-ext.h
@@ -109,8 +109,10 @@ enum nx_hash_fields {
 NX_HASH_FIELDS_NW_SRC,
 
 /* Network destination address (NXM_OF_IP_DST) only. */
-NX_HASH_FIELDS_NW_DST
+NX_HASH_FIELDS_NW_DST,
 
+/* Both network destination and source destination addresses. */
+NX_HASH_FIELDS_SYMMETRIC_L3
 };
 
 /* NXT_PACKET_IN (analogous to OFPT_PACKET_IN).
diff --git a/lib/bundle.c b/lib/bundle.c
index edf6676..558e890 100644
--- a/lib/bundle.c
+++ b/lib/bundle.c
@@ -198,6 +198,8 @@ bundle_parse__(const char *s, const struct ofputil_port_map 
*port_map,
 bundle->fields = NX_HASH_FIELDS_NW_SRC;
 } else if (!strcasecmp(fields, "nw_dst")) {
 bundle->fields = NX_HASH_FIELDS_NW_DST;
+} else if (!strcasecmp(fields, "symmetric_l3")) {
+bundle->fields = NX_HASH_FIELDS_SYMMETRIC_L3;
 } else {
 return xasprintf("%s: unknown fields `%s'", s, fields);
 }
diff --git a/lib/flow.c b/lib/flow.c
index bffee70..79e4627 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -2307,6 +2307,37 @@ flow_hash_symmetric_l3l4(const struct flow *flow, 
uint32_t basis,
 return hash_finish(hash, basis);
 }
 
+/* Hashes 'flow' based on its nw_dst and nw_src for multipath. */
+uint32_t
+flow_hash_symmetric_l3(const struct flow *flow, uint32_t basis)
+{
+struct {
+union {
+ovs_be32 ipv4_addr;
+struct in6_addr ipv6_addr;
+};
+ovs_be16 eth_type;
+} fields;
+
+int i;
+
+memset(&fields, 0, sizeof fields);
+fields.eth_type = flow->dl_type;
+
+if (fields.eth_type == htons(ETH_TYPE_IP)) {
+fields.ipv4_addr = flow->nw_src ^ flow->nw_dst;
+} else if (fields.eth_type == htons(ETH_TYPE_IPV6)) {
+const uint8_t *a = &flow->ipv6_src.s6_addr[0];
+const uint8_t *b = &flow->ipv6_dst.s6_addr[0];
+uint8_t *ipv6_addr = &fields.ipv6_addr.s6_addr[0];
+
+for (i = 0; i < 16; i++) {
+ipv6_addr[i] = a[i] ^ b[i];
+}
+}
+return jhash_bytes(&fields, sizeof fields, basis);
+}
+
 /* Initialize a flow with random fields that matter for nx_hash_fields. */
 void
 flow_random_hash_fields(struct flow *flow)
@@ -2422,6 +2453,16 @@ flow_mask_hash_fields(const struct flow *flow, struct 
flow_wildcards *wc,
 }
 break;
 
+case NX_HASH_FIELDS_SYMMETRIC_L3:
+if (flow->dl_type == htons(ETH_TYPE_IP)) {
+memset(&wc->masks.nw_src, 0xff, sizeof wc->masks.nw_src);
+memset(&wc->masks.nw_dst, 0xff, sizeof wc->masks.nw_dst);
+} else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
+memset(&wc->masks.ipv6_src, 0xff, sizeof wc->masks.ipv6_src);
+memset(&wc->masks.ipv6_dst, 0xff, sizeof wc->masks.ipv6_dst);
+}
+break;
+
 default:
 OVS_NOT_REACHED();
 }
@@ -2464,6 +2505,8 @@ flow_hash_fields(const struct flow *flow, enum 
nx_hash_fields fields,
 return basis;
 }
 
+case NX_HASH_FIELDS_SYMMETRIC_L3:
+return flow_hash_symmetric_l3(flow, basis);
 }
 
 OVS_NOT_REACHED();
@@ -2480,6 +2523,7 @@ flow_hash_fields_to_str(enum nx_hash_fields fields)
 case NX_HASH_FIELDS_SYMMETRIC_L3L4_UDP: return "symmetric_l3l4+udp";
 case NX_HASH_FIELDS_NW_SRC: return "nw_src";
 case NX_HASH_FIELDS_NW_DST: return "nw_dst";
+case NX_HASH_FIELDS_SYMMETRIC_L3: return "symmetric_l3";
 default: return "";
 }
 }
@@ -2493,7 +2537,8 @@ flow_hash_fields_valid(enum nx_hash_fields fields)
 || fields == NX_HASH_FIELDS_SYMMETRIC_L3L4
 || fields == NX_HASH_FIELDS_SYMMETRIC_L3L4_UDP
 || fields == NX_HASH_FIELDS_NW_SRC
-|| fields == NX_HASH_FIELDS_NW_DST;
+|| fields == NX_HASH_FIELDS_NW_DST
+|| fields == NX_HASH_FIELDS_SYMMETRIC_L3;
 }
 
 /* Returns a hash value for the bits of 'flow' that are active based on
diff --git a/lib/flow.h b/lib/flow.h
index d03f1ba..1f6c1d6 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -2

[ovs-dev] 关岭布依族苗族自治县

2018-09-28 Thread Orxfl
Ltd.___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev