[ovs-dev] Use more than 200000 flows in datapath/kernel cache flows

2019-10-28 Thread Zhiwei Cen
Dear ovs-dev,

I'm trying to see if I can increase the number of flows supported in
kernel cache. I found two relevant config options
(other_config:flow-limit and upcall/flow limit). My understanding was
that other_config:flow-limit is for the normal openflow flows while
upcall is for the kernel flows
(http://docs.openvswitch.org/en/latest/faq/design/ question one).

It seems I can change the limit for Openflow flows.
This works:
$ovs-vsctl --no-wait set Open_vSwitch . other_config:flow-limit=21

$ovs-vsctl --no-wait get Open_vSwitch . other_config:flow-limit
"21"

However I cannot change the limit for kernel flows.
This does not:
$ovs-appctl upcall/set-flow-limit 21
set flow_limit to 21

$ovs-appctl upcall/show
system@ovs-system:
  flows : (current 70) (avg 68) (max 92) (limit 20)
  dump duration : 3ms
  ufid enabled : true

  127: (keys 3)

...

My question is, is it possible to change the limit on kernel flows and
is upcall the right command?

Thanks,
Zhiwei
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Herzlichen Glückwunsch und bitte antworten Sie uns jetzt

2019-10-28 Thread anggi
Ich bestätige den Erhalt der E-Mail. Ich bin eine Frau von Mavis Wanczyk, einer 
älteren Bürgerin der Vereinigten Staaten von Amerika und gebürtigen 
Massachusetts. Ich bin mit 14 Jahren aus Springfield gezogen und lebe jetzt in 
Chicopee. Am 25. August gewann ich die Powerball Lotterie im Wert von 759 
Millionen US-Dollar. 2017. Der Jackpot war ein Glücksfall für mich und ich habe 
beschlossen, Gottes Willen zu tun und all meine 759 Millionen US-Dollar für die 
Erinnerung an ihre verstorbene Mutter zu spenden, die an Krebs gestorben ist.

Stellen Sie sicher, dass es zu 100% legitim ist. Bitte besuchen Sie diese Seite 
zur Bestätigung: A

http://insider.foxnews.com/2017/08/24/powerball-lottery-winner-mavis-wanczyk-comes-forward-massachusetts

Ich habe beschlossen, € 5.800.000,00 als Teil Ihres gemeinnützigen Projekts zu 
spenden, um das Leben vieler Menschen in der Region von meinen 759 Millionen 
Jackpot-Lotteriegewinnen und einem Denkmal zur Verbesserung meiner verstorbenen 
Frau zu retten. Ich betete und suchte im Internet nach Jemand, der dieses Geld 
in Ihrem Land spendet. Ich habe Ihr Profil in der E-Mail-Liste von Google / 
Facebook gesehen und Sie ausgewählt.

Mein Mitgefühl für das Leiden anderer ist durch die Tragödie in meinem Leben 
gestiegen. Meine Mutter Jan starb am 25. August 2017 im Alter von 67 Jahren an 
Lungenkrebs. Ich erinnerte mich, dass ich all meine 759 Millionen Dollar für 
wohltätige Zwecke und weniger Privilegien in der Gesellschaft ausgegeben hatte. 
Ich möchte nicht, dass es berühmt wird. Ich beschloss, es in die Medien zu 
bringen, und mein einziges Ziel war das Ergebnis eines langen Gesprächs mit 
meinem einzigen Sohn (Audrey), den ich immer konsultiere. Die Außerirdischen 
sollen dieses neue Jahr spenden, weil wir wissen, dass es viele Familien gibt, 
die von Bezahlung zu Bezahlung leben, und andere, die sich ihre finanziellen 
Verpflichtungen nicht leisten können. Obwohl ich am 25. August eine Lotterie 
gewann, halfen wir meinen Gewinnen durch Wohltätigkeitsorganisationen, 
Familienmitglieder, Freunde und Außerirdische.

Mein Sohn (Audrey) hat Krebs und der Arzt sagte uns, er würde nächstes Jahr 
nicht mehr lange leben, um ihn zu sehen. Er möchte gute Dinge tun, bevor er 
diese sündige Welt endgültig verlässt. Mein Sohn (Audrey) ist so nett zu mir 
und bedeutet mir die ganze Welt. Ich habe beschlossen, seine Wünsche in diesem 
neuen Jahr zu erfüllen. Ich möchte, dass Sie dieses Geld in ein lukratives 
Geschäft investieren, das mehr Arbeitsplätze für Arbeitslose und Bedürftige in 
Ihrer Region schafft, da wir bereits Geld für die örtliche Feuerwehr, das Rote 
Kreuz, Haiti, das Truro Cancer Hospital und einige andere Organisationen in 
Thailand haben Asien und China Europa kämpfen gegen Krebs, Alzheimer und 
Diabetes.

Um die Spendenaktion (€ 5.800.000,00) zu vereinfachen, die exklusiv für Sie 
gespendet wurde, senden Sie mir bitte Ihren vollständigen Namen, Ihre Adresse 
und Ihre Telefonnummer (ich habe nach diesen Informationen für Zahlungen 
gefragt, weiß nur, wen wir spenden und für Identität) Internet-Diebstahl, bei 
dem ich in der Vergangenheit ein Opfer war, würde ich Sie nicht nach einem 
Ausweis fragen, weil wir nicht möchten, dass Sie den Eindruck verlieren, ich 
würde mein Identitätsgeld rauben).

Ich hoffe, Sie können Ihr Geld in Ihrem eigenen Land sinnvoll einsetzen. Wir 
werden anpassen, was Sie tun können, um die Armut in Ihrer Region zu 
verringern, und auch versuchen, den Lebensstandard vieler Menschen zu 
verbessern, da das einzige Ziel darin besteht, dieses Geld zu spenden. Wir 
möchten, dass Sie sicherstellen, dass Sie unseren Anweisungen folgen und Ihrer 
Tochter helfen, ihren Traum zu verwirklichen, bevor Sie entscheiden, wie Sie an 
das Geld kommen. Wenn Sie das Gefühl haben, dass ich dieses Projekt nicht habe, 
kann ich schnell eine andere Person finden, die Geld spendet.
Wenn Sie irgendwelche Fragen haben, zögern Sie nicht zu fragen. Ich erwarte, 
dass Sie bald hören und
Bitte kontaktieren Sie uns nur unter: mmaviswancz...@gmail.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv6] netdev-afxdp: Add need_wakeup supprt.

2019-10-28 Thread William Tu
On Mon, Oct 28, 2019 at 12:11 PM Ilya Maximets  wrote:
>
> On 28.10.2019 11:46, Eelco Chaudron wrote:
> >
> >
> > On 23 Oct 2019, at 23:06, William Tu wrote:
> >
> >> The patch adds support for using need_wakeup flag in AF_XDP rings.
> >> A new option, use_need_wakeup, is added.  When this option is used,
> >> it means that OVS has to explicitly wake up the kernel RX, using poll()
> >> syscall and wake up TX, using sendto() syscall. This feature improves
> >> the performance by avoiding unnecessary sendto syscalls for TX.
> >> For RX, instead of kernel always busy-spinning on fille queue, OVS wakes
> >> up the kernel RX processing when fill queue is replenished.
> >>
> >> The need_wakeup feature is merged into Linux kernel bpf-next tee with 
> >> commit
> >> 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and
> >> OVS enables it by default, if libbpf supports it.  If users enable it but
> >> runs in an older version of libbpf, then the need_wakeup feature has no 
> >> effect,
> >> and a warning message is logged.
> >>
> >> For virtual interface, it's better set use_need_wakeup=false, since
> >> the virtual device's AF_XDP xmit is synchronous: the sendto syscall
> >> enters kernel and process the TX packet on tx queue directly.
> >>
> >> On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
> >> to physical port improves from 6.1Mpps to 7.3Mpps.
> >>
> >> Suggested-by: Ilya Maximets 
> >> Signed-off-by: William Tu 
> >
> > Reviewed based on diff from previous version, also did quick compile test 
> > with and without need_wakeup supported kernel.
> > No actual performance tests ran, as my setup is in a messed up state :(
> >
> > One small issue (guess can be fixed at apply time), the subject mentions 
> > “supprt”, it’s missing an o.
> >
> > Acked-by: Eelco Chaudron 
> >
>
> Thanks, I could fix the 'supprt' while applying the patch.
>
> But I have one more question.
> William, Eelco, what do you think about renaming the option itself from
> "options:use_need_wakeup" to "options:use-need-wakeup"?
> Most of the netdev options for netdev-dpdk and netdev-linux except few of them
> (e.g. n_rxq, n_rxq_desc) uses dashes in their names instead of underscores.
> I agree that right now there is a mess in OVS and there is no specified 
> convention
> for options naming, but it might be good idea to name all the new options in a
> similar way, to keep the API more or less consistent.  What do you think?
>
> I could squash in below diff while applying the patch (I also added a NEWS 
> entry):
>
Hi Ilya,

Yes, I'm ok with using dashes.
diff below looks good to me. Thanks
William

> diff --git a/Documentation/intro/install/afxdp.rst 
> b/Documentation/intro/install/afxdp.rst
> index 202b6cdf7..fa898912b 100644
> --- a/Documentation/intro/install/afxdp.rst
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -182,7 +182,7 @@ more details):
>
>* **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
>
> - * **use_need_wakeup**: default "true" if libbpf supports it, otherwise 
> false.
> + * **use-need-wakeup**: default "true" if libbpf supports it, otherwise 
> false.
>
>   For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
>   configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
> diff --git a/NEWS b/NEWS
> index 330ab3832..4d8092d62 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -5,6 +5,9 @@ Post-v2.12.0
>  separate project. You can find it at
>  https://github.com/ovn-org/ovn.git
>  - Userspace datapath:
> + * New option 'use-need-wakeup' for netdev-afxdp to control enabling
> +   of corresponding 'need_wakup' flag in AF_XDP rings.  Enabled by 
> default
> +   if supported by libbpf.
>* Add option to enable, disable and query TCP sequence checking in
>  conntrack.
>
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index b6b03e39b..270a5ab0c 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -330,7 +330,7 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t 
> ifindex,
>>rx, >tx, );
>   if (ret) {
>   VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
> - "use_need_wakeup: %s qid: %d",
> + "use-need-wakeup: %s qid: %d",
>ovs_strerror(errno),
>xdpmode == XDP_COPY ? "SKB": "DRV",
>use_need_wakeup ? "true" : "false",
> @@ -431,7 +431,7 @@ xsk_configure_all(struct netdev *netdev)
>
>   /* Configure each queue. */
>   for (i = 0; i < n_rxq; i++) {
> -VLOG_DBG("%s: configure queue %d mode %s use_need_wakeup %s.",
> +VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
>netdev_get_name(netdev), i,
>dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
>dev->use_need_wakeup ? "true" : "false");
> @@ -551,7 +551,7 @@ netdev_afxdp_set_config(struct netdev *netdev, 

Re: [ovs-dev] [PATCH v2.12] OVN: Combine conjunctions with identical matches into one flow.

2019-10-28 Thread 0-day Robot
Bleep bloop.  Greetings Mark Michelson, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
Failed to merge in the changes.
Patch failed at 0001 OVN: Combine conjunctions with identical matches into one 
flow.
The copy of the patch that failed is found in:
   
/var/lib/jenkins/jobs/upstream_build_from_pw/workspace/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2.12] OVN: Combine conjunctions with identical matches into one flow.

2019-10-28 Thread Mark Michelson
This is a backport of commit e659bab31a916d540411c93ca7125011b2e82b5c
from OVN master.

Conjunctive matches have an issue where it is possible to install
multiple flows that have identical matches. This results in ambiguity,
and can lead to features (such as ACLs) not functioning properly.

This change fixes the problem by combining conjunctions with identical
matches into a single flow. As an example, in the past we may have had
something like:

nw_dst=10.0.0.1 actions=conjunction(2,1/2)
nw_dst=10.0.0.1 actions=conjunction(3,1/2)

This commit changes this into

nw_dst=10.0.0.1 actions=conjunction(2,1/2),conjunction(3,1/2)

This way, there is only a single flow with the proscribed match, and
there is no ambiguity.

Signed-off-by: Mark Michelson 
Acked-by: Numan Siddique 
---
 ovn/controller/lflow.c  |  5 ++--
 ovn/controller/ofctrl.c | 74 ++---
 ovn/controller/ofctrl.h |  6 
 tests/ovn.at| 17 +---
 4 files changed, 80 insertions(+), 22 deletions(-)

diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 1aafafb33..09c2c78f1 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -735,8 +735,9 @@ consider_logical_flow(
 dst->clause = src->clause;
 dst->n_clauses = src->n_clauses;
 }
-ofctrl_add_flow(flow_table, ptable, lflow->priority, 0, >match,
-, >header_.uuid);
+
+ofctrl_add_or_append_flow(flow_table, ptable, lflow->priority, 0,
+  >match, , >header_.uuid);
 ofpbuf_uninit();
 }
 }
diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index 0fcaa72c3..f7211c5e0 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -69,6 +69,11 @@ struct ovn_flow {
 uint64_t cookie;
 };
 
+static struct ovn_flow *ovn_flow_alloc(uint8_t table_id, uint16_t priority,
+   uint64_t cookie,
+   const struct match *match,
+   const struct ofpbuf *actions,
+   const struct uuid *sb_uuid);
 static uint32_t ovn_flow_match_hash(const struct ovn_flow *);
 static struct ovn_flow *ovn_flow_lookup(struct hmap *flow_table,
 const struct ovn_flow *target,
@@ -657,16 +662,8 @@ ofctrl_check_and_add_flow(struct ovn_desired_flow_table 
*flow_table,
   const struct uuid *sb_uuid,
   bool log_duplicate_flow)
 {
-struct ovn_flow *f = xmalloc(sizeof *f);
-f->table_id = table_id;
-f->priority = priority;
-minimatch_init(>match, match);
-f->ofpacts = xmemdup(actions->data, actions->size);
-f->ofpacts_len = actions->size;
-f->sb_uuid = *sb_uuid;
-f->match_hmap_node.hash = ovn_flow_match_hash(f);
-f->uuid_hindex_node.hash = uuid_hash(>sb_uuid);
-f->cookie = cookie;
+struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie, match,
+actions, sb_uuid);
 
 ovn_flow_log(f, "ofctrl_add_flow");
 
@@ -721,9 +718,66 @@ ofctrl_add_flow(struct ovn_desired_flow_table 
*desired_flows,
 ofctrl_check_and_add_flow(desired_flows, table_id, priority, cookie,
   match, actions, sb_uuid, true);
 }
+
+void
+ofctrl_add_or_append_flow(struct ovn_desired_flow_table *desired_flows,
+  uint8_t table_id, uint16_t priority, uint64_t cookie,
+  const struct match *match,
+  const struct ofpbuf *actions,
+  const struct uuid *sb_uuid)
+{
+struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie, match,
+actions, sb_uuid);
+
+ovn_flow_log(f, "ofctrl_add_or_append_flow");
+
+struct ovn_flow *existing;
+existing = ovn_flow_lookup(_flows->match_flow_table, f, false);
+if (existing) {
+/* There's already a flow with this particular match. Append the
+ * action to that flow rather than adding a new flow
+ */
+uint64_t compound_stub[64 / 8];
+struct ofpbuf compound;
+ofpbuf_use_stub(, compound_stub, sizeof(compound_stub));
+ofpbuf_put(, existing->ofpacts, existing->ofpacts_len);
+ofpbuf_put(, f->ofpacts, f->ofpacts_len);
+
+free(existing->ofpacts);
+existing->ofpacts = xmemdup(compound.data, compound.size);
+existing->ofpacts_len = compound.size;
+
+ofpbuf_uninit();
+ovn_flow_destroy(f);
+} else {
+hmap_insert(_flows->match_flow_table, >match_hmap_node,
+f->match_hmap_node.hash);
+hindex_insert(_flows->uuid_flow_table, >uuid_hindex_node,
+  f->uuid_hindex_node.hash);
+}
+}
 
 /* ovn_flow. */
 
+static struct ovn_flow *

[ovs-dev] [PATCH ] travis: support ppc64le builds

2019-10-28 Thread David Wilder
Add support for travis-ci ppc64le builds.

- Updated matrix in .travis.yml to include a ppc64le build.
- Added support to install packages needed by specific architectures.

To keep the total build time at an acceptable level only a single build
job is included in the matrix for ppc64le.

A build report example can be found here [1]
[0] http://travis-ci.org/
[1] https://travis-ci.org/djlwilder/ovs/builds/604098141

Signed-off-by: David Wilder 
---
 .travis.yml  |  5 +++--
 .travis/linux-prepare.sh | 18 ++
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 5676d9748..c99f26815 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -14,7 +14,6 @@ addons:
   apt:
 packages:
   - bc
-  - gcc-multilib
   - libssl-dev
   - llvm-dev
   - libjemalloc1
@@ -24,7 +23,6 @@ addons:
   - libelf-dev
   - selinux-policy-dev
   - libunbound-dev
-  - libunbound-dev:i386
   - libunwind-dev
 
 before_install: ./.travis/${TRAVIS_OS_NAME}-prepare.sh
@@ -50,6 +48,9 @@ matrix:
 - os: osx
   compiler: clang
   env: OPTS="--disable-ssl"
+- os: linux-ppc64le
+  compiler: gcc
+  env: OPTS="--disable-ssl"
 
 script: ./.travis/${TRAVIS_OS_NAME}-build.sh $OPTS
 
diff --git a/.travis/linux-prepare.sh b/.travis/linux-prepare.sh
index e546d32cb..f3a9a6d44 100755
--- a/.travis/linux-prepare.sh
+++ b/.travis/linux-prepare.sh
@@ -15,8 +15,18 @@ cd ..
 pip install --disable-pip-version-check --user six flake8 hacking
 pip install --user --upgrade docutils
 
-if [ "$M32" ]; then
-# 32-bit and 64-bit libunwind can not be installed at the same time.
-# This will remove the 64-bit libunwind and install 32-bit version.
-sudo apt-get install -y libunwind-dev:i386
+# Include packages needed by specific architectures.
+if [ $TRAVIS_ARCH == amd64 ]; then
+   sudo apt-get install -y \
+libunbound-dev:i386 \
+gcc-multilib
+
+if [ "$M32" ]; then
+   # 32-bit and 64-bit libunwind can not be installed at the same time.
+   # This will remove the 64-bit libunwind and install 32-bit version.
+   sudo apt-get install -y libunwind-dev:i386
+fi
+
+elif [ $TRAVIS_ARCH == ppc64le ]; then
+   sudo apt-get install -y flex
 fi
-- 
2.23.0.162.gf1d4a28

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/3] Combine conjunctions with identical matches into one flow.

2019-10-28 Thread Mark Michelson
I pushed the first two patches of the series to master. I'll resubmit 
patch three as a separate submission.


On 10/28/19 1:45 PM, Numan Siddique wrote:



On Mon, Oct 28, 2019, 9:54 PM Numan Siddique > wrote:




On Mon, Oct 28, 2019, 9:29 PM Mark Michelson mailto:mmich...@redhat.com>> wrote:

On 10/28/19 11:33 AM, Numan Siddique wrote:
 > On Sat, Oct 26, 2019 at 2:37 AM Mark Michelson
mailto:mmich...@redhat.com>> wrote:
 >>
 >> As stated in previous commits, conjunctive matches have an
issue where
 >> it is possible to install multiple flows that have identical
matches.
 >> This results in ambiguity, and can lead to features (such as
ACLs) not
 >> functioning properly.
 >>
 >> This change fixes the problem by combining conjunctions with
identical
 >> matches into a single flow. As an example, in the past we
may have had
 >> something like:
 >>
 >> nw_dst=10.0.0.1 actions=conjunction(2,1/2)
 >> nw_dst=10.0.0.1 actions=conjunction(3,1/2)
 >>
 >> This commit changes this into
 >>
 >> nw_dst=10.0.0.1 actions=conjunction(2,1/2),conjunction(3,1/2)
 >>
 >> This way, there is only a single flow with the proscribed
match, and
 >> there is no ambiguity.
 >>
 >> Signed-off-by: Mark Michelson mailto:mmich...@redhat.com>>


Acked-by: Numan Siddique mailto:num...@ovn.org>>

 >> ---
 >>   controller/lflow.c  |  5 ++--
 >>   controller/ofctrl.c | 73
+
 >>   controller/ofctrl.h |  6 +
 >>   tests/ovn.at         | 17 +
 >>   4 files changed, 79 insertions(+), 22 deletions(-)
 >>
 >> diff --git a/controller/lflow.c b/controller/lflow.c
 >> index e3ed20cd4..34b7c36a6 100644
 >> --- a/controller/lflow.c
 >> +++ b/controller/lflow.c
 >> @@ -736,8 +736,9 @@ consider_logical_flow(
 >>                   dst->clause = src->clause;
 >>                   dst->n_clauses = src->n_clauses;
 >>               }
 >> -            ofctrl_add_flow(flow_table, ptable,
lflow->priority, 0, >match,
 >> -                            , >header_.uuid);
 >> +
 >> +            ofctrl_add_or_append_flow(flow_table, ptable,
lflow->priority, 0,
 >> +                                      >match, ,
>header_.uuid);
 >>               ofpbuf_uninit();
 >>           }
 >>       }
 >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
 >> index 3131baff0..afb0036f1 100644
 >> --- a/controller/ofctrl.c
 >> +++ b/controller/ofctrl.c
 >> @@ -69,6 +69,11 @@ struct ovn_flow {
 >>       uint64_t cookie;
 >>   };
 >>
 >> +static struct ovn_flow *ovn_flow_alloc(uint8_t table_id,
uint16_t priority,
 >> +                                       uint64_t cookie,
 >> +                                       const struct match
*match,
 >> +                                       const struct ofpbuf
*actions,
 >> +                                       const struct uuid
*sb_uuid);
 >>   static uint32_t ovn_flow_match_hash(const struct ovn_flow *);
 >>   static struct ovn_flow *ovn_flow_lookup(struct hmap
*flow_table,
 >>                                           const struct
ovn_flow *target,
 >> @@ -657,16 +662,8 @@ ofctrl_check_and_add_flow(struct
ovn_desired_flow_table *flow_table,
 >>                             const struct uuid *sb_uuid,
 >>                             bool log_duplicate_flow)
 >>   {
 >> -    struct ovn_flow *f = xmalloc(sizeof *f);
 >> -    f->table_id = table_id;
 >> -    f->priority = priority;
 >> -    minimatch_init(>match, match);
 >> -    f->ofpacts = xmemdup(actions->data, actions->size);
 >> -    f->ofpacts_len = actions->size;
 >> -    f->sb_uuid = *sb_uuid;
 >> -    f->match_hmap_node.hash = ovn_flow_match_hash(f);
 >> -    f->uuid_hindex_node.hash = uuid_hash(>sb_uuid);
 >> -    f->cookie = cookie;
 >> +    struct ovn_flow *f = ovn_flow_alloc(table_id, priority,
cookie, match,
 >> +                                        actions, sb_uuid);
 >>
 >>       ovn_flow_log(f, "ofctrl_add_flow");
 >>
 >> @@ -721,9 +718,65 @@ ofctrl_add_flow(struct
ovn_desired_flow_table *desired_flows,
 >>       ofctrl_check_and_add_flow(desired_flows, table_id,
priority, cookie,
 >>                                 match, 

Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port

2019-10-28 Thread William Tu
Hi Noa,

Thanks for your reply.

> > > > Hi Noa,
> > > >
> > > > I have a couple more questions. I'm still at the learning stage of
> > > > this new feature, thanks in advance for your patience.
> > > >
> > > > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > > > > dpdkvdpa netdev works with 3 components:
> > > > > vhost-user socket, vdpa device: real vdpa device or a VF and
> > > > > representor of "vdpa device".
> > > >
> > > > What NIC card support this feature?
> > > > I don't have real vdpa device, can I use Intel X540 VF feature?
> > > >
> > >
> > > This feature will have two modes, SW and HW.
> > > The SW mode doesn't depend on a real vdpa device and allows you to use
> > > this feature even if you don't have a NIC that support it.
> Although you need to use representors, so you need your NIC to support it.
> > > The HW mode will be implemented in the future and will use a real vdpa
> > > device. It will be better to use the HW mode if you have a NIC that 
> > > support
> > it.
> > >
> > > For now, we only support the SW mode, when vdpa will have support in
> > > dpdk, we will add the HW mode to OVS.
> > >
> > > > >
> > > > > In order to add a new vDPA port, add a new port to existing bridge
> > > > > with type dpdkvdpa and vDPA options:
> > > > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> > > > >options:vdpa-socket-path=
> > > > >options:vdpa-accelerator-devargs=
> > > > >options:dpdk-devargs=,representor=[id]
> > > > >
> > > > > On this command OVS will create a new netdev:
> > > > > 1. Register vhost-user-client device.
> > > > > 2. Open and configure VF dpdk port.
> > > > > 3. Open and configure representor dpdk port.
> > > > >
> > > > > The new netdev will use netdev_rxq_recv() function in order to
> > > > > receive packets from VF and push to vhost-user and receive packets
> > > > > from vhost-user and push to VF.
> > > > >
> > > > > Signed-off-by: Noa Ezra 
> > > > > Reviewed-by: Oz Shlomo 
> > > > > ---
> > > > >  Documentation/automake.mk   |   1 +
> > > > >  Documentation/topics/dpdk/index.rst |   1 +
> > > > >  Documentation/topics/dpdk/vdpa.rst  |  90 
> > > > >  NEWS|   1 +
> > > > >  lib/netdev-dpdk.c   | 162
> > > > 
> > > > >  vswitchd/vswitch.xml|  25 ++
> > > > >  6 files changed, 280 insertions(+)  create mode 100644
> > > > > Documentation/topics/dpdk/vdpa.rst
> > > > >
> > > > > diff --git a/Documentation/automake.mk
> > > b/Documentation/automake.mk
> > > > > index cd68f3b..ee574bc 100644
> > > > > --- a/Documentation/automake.mk
> > > > > +++ b/Documentation/automake.mk
> > > > > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> > > > > Documentation/topics/dpdk/ring.rst \
> > > > > Documentation/topics/dpdk/vdev.rst \
> > > > > Documentation/topics/dpdk/vhost-user.rst \
> > > > > +   Documentation/topics/dpdk/vdpa.rst \
> > > > > Documentation/topics/fuzzing/index.rst \
> > > > > Documentation/topics/fuzzing/what-is-fuzzing.rst \
> > > > > Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
> > > > > diff --git a/Documentation/topics/dpdk/index.rst
> > > > > b/Documentation/topics/dpdk/index.rst
> > > > > index cf24a7b..c1d4ea7 100644
> > > > > --- a/Documentation/topics/dpdk/index.rst
> > > > > +++ b/Documentation/topics/dpdk/index.rst
> > > > > @@ -41,3 +41,4 @@ The DPDK Datapath
> > > > > /topics/dpdk/pdump
> > > > > /topics/dpdk/jumbo-frames
> > > > > /topics/dpdk/memory
> > > > > +   /topics/dpdk/vdpa
> > > > > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > > > > b/Documentation/topics/dpdk/vdpa.rst
> > > > > new file mode 100644



> > > > > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> > > > dp_packet_batch *batch,
> > > > >  return 0;
> > > > >  }
> > > > >
> > > > > +static int
> > > > > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > > > > +  struct dp_packet_batch *batch,
> > > > > +  int *qfill) {
> > > > > +struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > > > > +int fwd_rx;
> > > > > +int ret;
> > > > > +
> > > > > +fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > > > > + rxq->queue_id);
> > > > I'm still not clear about the above function.
> > > > So netdev_dpdk_vdpa_recv_impl()
> > > > netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter
> > > > ...
> > > > rte_eth_rx_burst(qpair->port_id_rx...)
> > > > ...
> > > > rte_eth_tx_burst(qpair->port_id_tx...)
> > > >
> > > > So looks like forwarding between vf to vhostuser and vice versa is
> > > > done in this function.
> > > >
> > > > > +ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> > > >
> > > > Then why do we call netdev_dpdk_rxq_recv() above again?
> > > > Are packets received above the same packets as 

Re: [ovs-dev] [PATCH ovs v3 0/2] Introduce dpdkvdpa netdev

2019-10-28 Thread Roni Bar Yanai
Hi ilya, 
please see inline

>-Original Message-
>From: Ilya Maximets 
>Sent: Monday, October 28, 2019 3:46 PM
>To: Noa Levy ; ovs-dev@openvswitch.org; Roni Bar Yanai
>
>Cc: Oz Shlomo ; Majd Dibbiny ;
>Ameer Mahagneh ; Eli Britstein
>; William Tu ; Simon Horman
>
>Subject: Re: [ovs-dev] [PATCH ovs v3 0/2] Introduce dpdkvdpa netdev
>
>On 17.10.2019 13:16, Noa Ezra wrote:
>> There are two approaches to communicate with a guest, using virtIO or
>> SR-IOV.
>> SR-IOV allows working with port representor which is attached to the
>> OVS and a matching VF is given with pass-through to the VM.
>> HW rules can process packets from up-link and direct them to the VF
>> without going through SW (OVS) and therefore SR-IOV gives the best
>> performance.
>> However, SR-IOV architecture requires that the guest will use a driver
>> which is specific to the underlying HW. Specific HW driver has two
>> main
>> drawbacks:
>> 1. Breaks virtualization in some sense (VM aware of the HW), can also
>> limit the type of images supported.
>> 2. Less natural support for live migration.
>>
>> Using virtIO interface solves both problems, but reduces performance
>> and causes losing of some functionality, for example, for some HW
>> offload, working directly with virtIO cannot be supported.
>> In order to solve this conflict, we created a new netdev type-dpdkvdpa.
>> The new netdev is basically similar to a regular dpdk netdev, but it
>> has some additional functionality for transferring packets from virtIO
>> guest (VM) to a VF and vice versa. With this solution we can benefit
>> both SR-IOV and virtIO.
>> vDPA netdev is designed to support both SW and HW use-cases.
>> HW mode will be used to configure vDPA capable devices. The support
>> for this mode is on progress in the dpdk community.
>> SW acceleration is used to leverage SR-IOV offloads to virtIO guests
>> by relaying packets between VF and virtio devices and as a pre-step
>> for supporting vDPA in HW mode.
>>
>> Running example:
>> 1. Configure OVS bridge and ports:
>> ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
>> ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options: \
>>  dpdk-devargs=
>> ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
>>  options:vdpa-socket-path= \
>>  options:vdpa-accelerator-devargs= \
>>  options:dpdk-devargs=,representor=[id] 2. Run a
>> virtIO guest (VM) in server mode that creates the socket of
>> the vDPA port.
>> 3. Send traffic.
>>
>> Noa Ezra (2):
>>netdev-dpdk-vdpa: Introduce dpdkvdpa netdev
>>netdev-dpdk: Add dpdkvdpa port
>>
>>   Documentation/automake.mk   |   1 +
>>   Documentation/topics/dpdk/index.rst |   1 +
>>   Documentation/topics/dpdk/vdpa.rst  |  90 +
>>   NEWS|   1 +
>>   lib/automake.mk |   4 +-
>>   lib/netdev-dpdk-vdpa.c  | 750
>
>>   lib/netdev-dpdk-vdpa.h  |  54 +++
>>   lib/netdev-dpdk.c   | 162 
>>   vswitchd/vswitch.xml|  25 ++
>>   9 files changed, 1087 insertions(+), 1 deletion(-)
>>   create mode 100644 Documentation/topics/dpdk/vdpa.rst
>>   create mode 100755 lib/netdev-dpdk-vdpa.c
>>   create mode 100644 lib/netdev-dpdk-vdpa.h
>>
>
>Hi, everyone.
>
>So, I have a few questions (mostly to Roni?):
>
>1. What happened to idea of implementing this as a DPDK vdev?
We wanted to solve both OVS-kernel and OVS-DPDK issue.
The main argument against it was that we allow to define ports that are not
 OF ports on the switch. regardless of how useful we think this feature can be
 it breaks something fundamental, and looking back I agree.  
The vdev, was a workaround, but after speaking with some of our DPDK
experts, they had similar arguments, this time for DPDK world. You create 
a port that never gets packet and never sends packets. This totally
doesn't make sense even if it can be useful in some cases. When we remove
the kernel form the equation, we are left with vDPA.
In fact we followed your suggestion of having SW vDPA and HW vDPA. 
This makes total sense, for example open stack can configure vDPA and 
let the OVS probe the device and go with HW or SW, leaving same functionality
and same configuration.

>
>2. What was the results of "direct output optimization" patch [1] testing?
>At this point I also have to mention that OVS is going to have TSO support
>in one of the next releases (at least we have some progress on that, thanks
>to idea of using extbuf).  This way in combine with patch [1] there should
>be no any benefits from having separate netdev for forwarding purposes.
>Even without this patch, havin TSO alone should provide good performance
>benefits making it considerable to not have any extra netdev.

>
>[1]
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork

Re: [ovs-dev] [PATCHv6] netdev-afxdp: Add need_wakeup supprt.

2019-10-28 Thread Ilya Maximets

On 28.10.2019 11:46, Eelco Chaudron wrote:



On 23 Oct 2019, at 23:06, William Tu wrote:


The patch adds support for using need_wakeup flag in AF_XDP rings.
A new option, use_need_wakeup, is added.  When this option is used,
it means that OVS has to explicitly wake up the kernel RX, using poll()
syscall and wake up TX, using sendto() syscall. This feature improves
the performance by avoiding unnecessary sendto syscalls for TX.
For RX, instead of kernel always busy-spinning on fille queue, OVS wakes
up the kernel RX processing when fill queue is replenished.

The need_wakeup feature is merged into Linux kernel bpf-next tee with commit
77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and
OVS enables it by default, if libbpf supports it.  If users enable it but
runs in an older version of libbpf, then the need_wakeup feature has no effect,
and a warning message is logged.

For virtual interface, it's better set use_need_wakeup=false, since
the virtual device's AF_XDP xmit is synchronous: the sendto syscall
enters kernel and process the TX packet on tx queue directly.

On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
to physical port improves from 6.1Mpps to 7.3Mpps.

Suggested-by: Ilya Maximets 
Signed-off-by: William Tu 


Reviewed based on diff from previous version, also did quick compile test with 
and without need_wakeup supported kernel.
No actual performance tests ran, as my setup is in a messed up state :(

One small issue (guess can be fixed at apply time), the subject mentions 
“supprt”, it’s missing an o.

Acked-by: Eelco Chaudron 



Thanks, I could fix the 'supprt' while applying the patch.

But I have one more question.
William, Eelco, what do you think about renaming the option itself from
"options:use_need_wakeup" to "options:use-need-wakeup"?
Most of the netdev options for netdev-dpdk and netdev-linux except few of them
(e.g. n_rxq, n_rxq_desc) uses dashes in their names instead of underscores.
I agree that right now there is a mess in OVS and there is no specified 
convention
for options naming, but it might be good idea to name all the new options in a
similar way, to keep the API more or less consistent.  What do you think?

I could squash in below diff while applying the patch (I also added a NEWS 
entry):

diff --git a/Documentation/intro/install/afxdp.rst 
b/Documentation/intro/install/afxdp.rst
index 202b6cdf7..fa898912b 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -182,7 +182,7 @@ more details):
 
  * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
 
- * **use_need_wakeup**: default "true" if libbpf supports it, otherwise false.

+ * **use-need-wakeup**: default "true" if libbpf supports it, otherwise false.
 
 For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,

 configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
diff --git a/NEWS b/NEWS
index 330ab3832..4d8092d62 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,9 @@ Post-v2.12.0
separate project. You can find it at
https://github.com/ovn-org/ovn.git
- Userspace datapath:
+ * New option 'use-need-wakeup' for netdev-afxdp to control enabling
+   of corresponding 'need_wakup' flag in AF_XDP rings.  Enabled by default
+   if supported by libbpf.
  * Add option to enable, disable and query TCP sequence checking in
conntrack.
 
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c

index b6b03e39b..270a5ab0c 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -330,7 +330,7 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t 
ifindex,
  >rx, >tx, );
 if (ret) {
 VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
- "use_need_wakeup: %s qid: %d",
+ "use-need-wakeup: %s qid: %d",
  ovs_strerror(errno),
  xdpmode == XDP_COPY ? "SKB": "DRV",
  use_need_wakeup ? "true" : "false",
@@ -431,7 +431,7 @@ xsk_configure_all(struct netdev *netdev)
 
 /* Configure each queue. */

 for (i = 0; i < n_rxq; i++) {
-VLOG_DBG("%s: configure queue %d mode %s use_need_wakeup %s.",
+VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
  netdev_get_name(netdev), i,
  dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
  dev->use_need_wakeup ? "true" : "false");
@@ -551,7 +551,7 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct 
smap *args,
 return EINVAL;
 }
 
-need_wakeup = smap_get_bool(args, "use_need_wakeup", NEED_WAKEUP_DEFAULT);

+need_wakeup = smap_get_bool(args, "use-need-wakeup", NEED_WAKEUP_DEFAULT);
 #ifndef HAVE_XDP_NEED_WAKEUP
 if (need_wakeup) {
 VLOG_WARN("XDP need_wakeup is not supported in libbpf.");
@@ -580,7 +580,7 @@ netdev_afxdp_get_config(const struct netdev *netdev, struct 
smap *args)
 

Re: [ovs-dev] [PATCH branch-2.5] travis: Drop 2.6.32 kernel build.

2019-10-28 Thread Ilya Maximets

On 06.08.2019 15:06, Ilya Maximets wrote:

gcc >= 5 can't build Linux kernel 2.6.32 and this will never
change because 2.6.32 is not supported for a last few years.

TravsCI migrated to use Ubuntu Xenial by defualt with gcc 5
installed. Dropping the 2.6.32 build item to unlock green build.

Signed-off-by: Ilya Maximets 
---

Sending this patch as it was a preferred solution for a few
people in discussion here:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361278.html

Two patches additionally needs to be backported for successful Travis
build:

   a7021b08b 2018-07-09 | configure: Disable -Wnull-pointer-arithmetic Clang 
warning. [Ben Pfaff]
   1e78e3085 2017-01-26 | libX.pc: use the correct output directory [Aaron 
Conole]

I could backport them along with applying this patch.


Applied to branch-2.5 with backports.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 00/19] OVN Interconnection

2019-10-28 Thread Han Zhou
On Mon, Oct 28, 2019 at 7:25 AM Numan Siddique  wrote:
>
> On Mon, Oct 21, 2019 at 6:23 AM Han Zhou  wrote:
> >
> > The series supports interconnecting multiple OVN deployments (e.g.
 located at
> > multiple data centers) through logical routers connected with tansit
logical
> > switches with overlay tunnels, managed through OVN control plane.  See
the
> > ovn-architecture.rst document updates for more details, and find the
> > instructions in Documentation/tutorials/ovn-interconnection.rst.
> >
> > The first 6 patches are not directly related to the feature, but are
fixes for
> > existing problems but required by later patches.
> >
>
> For the first 5 patches in the series - Acked-by: Numan Siddique
> 
>

Thanks Numan. I applied the first 5 patches to master.

> I think these 5 patches can be merged so that the v2 of this series
> will have lesser patches.
>
> Regarding the patch 6, which updates the NEWS, does it makes sense to
> delete the contents
> of this file and  start fresh capturing only OVN related news ? Or
> it's still better to preserve
> the contents ?
>

Do you mean removing all previous news, or just removing the OVS related
news and keep OVN related news?

I am ok with removing all the previous news and replace with a pointer to
OVS repo if people want to look at previous OVN news.

However, I think it may be more straightforward to leave as is for whatever
is there up to 2.12, which clearly tells how OVN is forked from the OVS
project. What do you think?

> I haven't yet looked into patch 7 onwards. I will come back to them soon.
>
> Thanks
> Numan
>
>
> > Han Zhou (19):
> >   ovn-northd.c: Fix datapath tunnel key allocation.
> >   testsuite: Use ovn-macros instead of ofproto-macros.
> >   sandbox: Remove the unused ovnnb.db from sandbox.
> >   ovn-ctl: Change the order of ovn-lib and ovs-lib.
> >   ovn-lib.in: Fix ovn-ctl status_xxx.
> >   news: Fix the out-of-date information afte split from ovs.
> >   ovn-architecture: Add documentation for OVN interconnection feature.
> >   ovn-inb: Interconnection northbound DB schema and CLI.
> >   ovn-isb: Interconnection southbound DB schema and CLI.
> >   ovn-ic: Interconnection controller with AZ registeration.
> >   ovn-northd.c: Refactor allocate_tnlid.
> >   ovn-ic: Transit switch controller.
> >   ovn-sb: Add columns is_interconn and is_remote to Chassis.
> >   ovn-ic: Interconnection gateway controller.
> >   ovn-ic: Interconnection port controller.
> >   ovn.at: e2e test for OVN interconnection.
> >   ovn-ctl: Refactor to reduce redundant code.
> >   ovn-ctl: Support commands for interconnection.
> >   tutorial: Add tutorial for OVN Interconnection.
> >
> >  .gitignore  |6 +
> >  Documentation/automake.mk   |1 +
> >  Documentation/tutorials/index.rst   |1 +
> >  Documentation/tutorials/ovn-interconnection.rst |  181 
> >  Makefile.am |1 +
> >  NEWS|   46 +-
> >  TODO.rst|   10 +
> >  automake.mk |   75 ++
> >  controller/binding.c|6 +-
> >  controller/chassis.c|   14 +
> >  debian/ovn-common.install   |2 +
> >  debian/ovn-common.manpages  |4 +
> >  ic/.gitignore   |2 +
> >  ic/automake.mk  |   10 +
> >  ic/ovn-ic.8.xml |  111 +++
> >  ic/ovn-ic.c | 1042
+++
> >  lib/.gitignore  |6 +
> >  lib/automake.mk |   32 +-
> >  lib/ovn-inb-idl.ann |9 +
> >  lib/ovn-isb-idl.ann |9 +
> >  lib/ovn-util.c  |   92 ++
> >  lib/ovn-util.h  |   15 +
> >  northd/ovn-northd.c |  110 +--
> >  ovn-architecture.7.xml  |  107 ++-
> >  ovn-inb.ovsschema   |   75 ++
> >  ovn-inb.xml |  371 
> >  ovn-isb.ovsschema   |  129 +++
> >  ovn-isb.xml |  582 +
> >  ovn-nb.ovsschema|5 +-
> >  ovn-nb.xml  |   28 +-
> >  ovn-sb.ovsschema|8 +-
> >  ovn-sb.xml  |   24 +
> >  tests/automake.mk   |8 +-
> >  tests/ofproto-macros.at |  177 
> >  tests/ovn-ic.at |  192 +
> >  tests/ovn-inbctl.at |   65 ++
> >  tests/ovn-isbctl.at

Re: [ovs-dev] Travis build failures due to base image change to Xenial.

2019-10-28 Thread Ilya Maximets

On 28.10.2019 18:30, Ben Pfaff wrote:

On Mon, Oct 28, 2019 at 06:24:44PM +0100, Ilya Maximets wrote:

On 28.10.2019 18:05, Ben Pfaff wrote:

On Mon, Oct 28, 2019 at 03:04:56PM +0100, Ilya Maximets wrote:

On 09.08.2019 12:23, Ilya Maximets wrote:

On 01.08.2019 20:48, Aaron Conole wrote:

Ilya Maximets  writes:

However, there is an additional issue with branch-2.5:

branch-2.5 has kernel 2.6.32 in the .travis.yml, but gcc >= 5 is not
able to build this kernel. This kernel reached its EOL few years ago
already and will never be fixed. So, there are few options for this issue:

1. Simply remove 2.6.32 kernel from the build matrix.
  (I have a simple patch for this.)


For branch-2.5 I'm waiting for more comments/Ack on the patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361422.html

Backports to this branch will go along with applying above patch.


Hi Ben,

This discussion happened while you were out of office, but I'd like to
hear your opinion on this issue with branch-2.5 that still exists.


My initial thought is that we don't have a clear obligation to support
an end-of-lifed upstream kernel.  I haven't read the whole thread to see
whether there's a reason to do so anyway, but if not then removing
2.6.32 from the build matrix seems reasonable.


OK. So, can I treat this as an ACK for above mentioned patch?


Acked-by: Ben Pfaff 



Thanks! I'll apply build matrix update along with 2 other required backports
as soon as Travis will finish checking them.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/3] Combine conjunctions with identical matches into one flow.

2019-10-28 Thread Numan Siddique
On Mon, Oct 28, 2019, 9:54 PM Numan Siddique  wrote:

>
>
> On Mon, Oct 28, 2019, 9:29 PM Mark Michelson  wrote:
>
>> On 10/28/19 11:33 AM, Numan Siddique wrote:
>> > On Sat, Oct 26, 2019 at 2:37 AM Mark Michelson 
>> wrote:
>> >>
>> >> As stated in previous commits, conjunctive matches have an issue where
>> >> it is possible to install multiple flows that have identical matches.
>> >> This results in ambiguity, and can lead to features (such as ACLs) not
>> >> functioning properly.
>> >>
>> >> This change fixes the problem by combining conjunctions with identical
>> >> matches into a single flow. As an example, in the past we may have had
>> >> something like:
>> >>
>> >> nw_dst=10.0.0.1 actions=conjunction(2,1/2)
>> >> nw_dst=10.0.0.1 actions=conjunction(3,1/2)
>> >>
>> >> This commit changes this into
>> >>
>> >> nw_dst=10.0.0.1 actions=conjunction(2,1/2),conjunction(3,1/2)
>> >>
>> >> This way, there is only a single flow with the proscribed match, and
>> >> there is no ambiguity.
>> >>
>> >> Signed-off-by: Mark Michelson 
>>
>
Acked-by: Numan Siddique 

>> ---
>> >>   controller/lflow.c  |  5 ++--
>> >>   controller/ofctrl.c | 73
>> +
>> >>   controller/ofctrl.h |  6 +
>> >>   tests/ovn.at| 17 +
>> >>   4 files changed, 79 insertions(+), 22 deletions(-)
>> >>
>> >> diff --git a/controller/lflow.c b/controller/lflow.c
>> >> index e3ed20cd4..34b7c36a6 100644
>> >> --- a/controller/lflow.c
>> >> +++ b/controller/lflow.c
>> >> @@ -736,8 +736,9 @@ consider_logical_flow(
>> >>   dst->clause = src->clause;
>> >>   dst->n_clauses = src->n_clauses;
>> >>   }
>> >> -ofctrl_add_flow(flow_table, ptable, lflow->priority, 0,
>> >match,
>> >> -, >header_.uuid);
>> >> +
>> >> +ofctrl_add_or_append_flow(flow_table, ptable,
>> lflow->priority, 0,
>> >> +  >match, ,
>> >header_.uuid);
>> >>   ofpbuf_uninit();
>> >>   }
>> >>   }
>> >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>> >> index 3131baff0..afb0036f1 100644
>> >> --- a/controller/ofctrl.c
>> >> +++ b/controller/ofctrl.c
>> >> @@ -69,6 +69,11 @@ struct ovn_flow {
>> >>   uint64_t cookie;
>> >>   };
>> >>
>> >> +static struct ovn_flow *ovn_flow_alloc(uint8_t table_id, uint16_t
>> priority,
>> >> +   uint64_t cookie,
>> >> +   const struct match *match,
>> >> +   const struct ofpbuf *actions,
>> >> +   const struct uuid *sb_uuid);
>> >>   static uint32_t ovn_flow_match_hash(const struct ovn_flow *);
>> >>   static struct ovn_flow *ovn_flow_lookup(struct hmap *flow_table,
>> >>   const struct ovn_flow
>> *target,
>> >> @@ -657,16 +662,8 @@ ofctrl_check_and_add_flow(struct
>> ovn_desired_flow_table *flow_table,
>> >> const struct uuid *sb_uuid,
>> >> bool log_duplicate_flow)
>> >>   {
>> >> -struct ovn_flow *f = xmalloc(sizeof *f);
>> >> -f->table_id = table_id;
>> >> -f->priority = priority;
>> >> -minimatch_init(>match, match);
>> >> -f->ofpacts = xmemdup(actions->data, actions->size);
>> >> -f->ofpacts_len = actions->size;
>> >> -f->sb_uuid = *sb_uuid;
>> >> -f->match_hmap_node.hash = ovn_flow_match_hash(f);
>> >> -f->uuid_hindex_node.hash = uuid_hash(>sb_uuid);
>> >> -f->cookie = cookie;
>> >> +struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie,
>> match,
>> >> +actions, sb_uuid);
>> >>
>> >>   ovn_flow_log(f, "ofctrl_add_flow");
>> >>
>> >> @@ -721,9 +718,65 @@ ofctrl_add_flow(struct ovn_desired_flow_table
>> *desired_flows,
>> >>   ofctrl_check_and_add_flow(desired_flows, table_id, priority,
>> cookie,
>> >> match, actions, sb_uuid, true);
>> >>   }
>> >> +
>> >> +void
>> >> +ofctrl_add_or_append_flow(struct ovn_desired_flow_table
>> *desired_flows,
>> >> +  uint8_t table_id, uint16_t priority,
>> uint64_t cookie,
>> >> +  const struct match *match,
>> >> +  const struct ofpbuf *actions,
>> >> +  const struct uuid *sb_uuid)
>> >> +{
>> >> +struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie,
>> match,
>> >> +actions, sb_uuid);
>> >> +
>> >> +ovn_flow_log(f, "ofctrl_add_or_append_flow");
>> >> +
>> >> +struct ovn_flow *existing;
>> >> +existing = ovn_flow_lookup(_flows->match_flow_table, f,
>> false);
>> >> +if (existing) {
>> >> +/* There's already a flow with this particular match. Append
>> the
>> >> + * action to that flow rather than adding a new flow
>> >> + 

Re: [ovs-dev] [PATCH 1/3] Revert conjunctive match removal patches.

2019-10-28 Thread Numan Siddique
On Sat, Oct 26, 2019, 2:38 AM Mark Michelson  wrote:

> This partially reverts commits 6f914327a55aaab11507db0911b08d695e17ce2a
> and 298701dbc99645700be41680a43d049cb061847a. This restores the behavior
> of making conjunctive matches, and it restores the tests to their state.
>
> The one thing it does not revert is the addition of the
> force_cross_product variable in expr.c. This variable will be used in a
> later commit in this series. Instead of removing it, we just set it
> "false" to restore previous behavior.
>
> With this commit, the conjunctive match misbehavior described in commit
> 298701dbc99645700be41680a43d049cb061847a is restored. However, this will
> be fixed in an upcoming commit in this series.
>
> Signed-off-by: Mark Michelson 
>

Acked-by: Numan Siddique 

---
>  TODO.rst |   10 -
>  lib/expr.c   |9 +-
>  tests/ovn.at | 1250
> +++---
>  3 files changed, 61 insertions(+), 1208 deletions(-)
>
> diff --git a/TODO.rst b/TODO.rst
> index ed55ea236..943d9bf81 100644
> --- a/TODO.rst
> +++ b/TODO.rst
> @@ -145,13 +145,3 @@ OVN To-do List
>* Support FTP ALGs.
>
>* Support reject action.
> -
> -* Conjunction: Conjunction is disabled in OVN. This needs to be revisisted
> -  to enable conjunction again after addressing the issues related to it.
> -  Like, if there are multiple ACLs with overlapping Conjunction matches,
> -  conjunction flows are not added properly.
> -  Eg. match(ip4.src == {IP1, IP2, IP3} && ip4.dst == {IP4, IP5, IP6} &&
> -  tcp.dst >= 800 && tcp.dst <= 900) actions=drop
> -
> -  match(ip4.src == {IP1, IP2, IP3} && ip4.dst == {IP4, IP5, IP6} &&
> -  tcp.dst >= 1000 && tcp.dst <= 2000) actions=allow
> diff --git a/lib/expr.c b/lib/expr.c
> index 9b9b6bcca..71de61543 100644
> --- a/lib/expr.c
> +++ b/lib/expr.c
> @@ -42,13 +42,8 @@ VLOG_DEFINE_THIS_MODULE(expr);
>   *
>   * match 2 - ip4.src == {IP1, IP2} && tcp.dst >=700 && tcp.src <=800
>   * action - allow.
> - *
> - * To handle this issue temporarily force crossproduct so that conjunction
> - * flows are not generated.
> - *
> - * Remove this once fixed.
> - * */
> -static bool force_crossproduct = true;
> + */
> +static bool force_crossproduct = false;
>
>  static struct expr *parse_and_annotate(const char *s,
> const struct shash *symtab,
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 22b272a60..641a646fc 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -589,22 +589,6 @@ ip,reg14=0x6
>  ipv6,reg14=0x5
>  ipv6,reg14=0x6
>  ])
> -AT_CHECK([expr_to_flow 'inport == {"eth0", "eth1", "eth2"} && ip4 && tcp
> && tcp.dst == {500, 501}'], [0], [dnl
> -tcp,reg14=0x5,tp_dst=500
> -tcp,reg14=0x5,tp_dst=501
> -tcp,reg14=0x6,tp_dst=500
> -tcp,reg14=0x6,tp_dst=501
> -])
> -AT_CHECK([expr_to_flow 'outport == {"eth0", "eth1", "eth2"} && ip4 && tcp
> && tcp.src == {400, 401} && tcp.dst == {500, 501}'], [0], [dnl
> -tcp,reg15=0x5,tp_src=400,tp_dst=500
> -tcp,reg15=0x5,tp_src=400,tp_dst=501
> -tcp,reg15=0x5,tp_src=401,tp_dst=500
> -tcp,reg15=0x5,tp_src=401,tp_dst=501
> -tcp,reg15=0x6,tp_src=400,tp_dst=500
> -tcp,reg15=0x6,tp_src=400,tp_dst=501
> -tcp,reg15=0x6,tp_src=401,tp_dst=500
> -tcp,reg15=0x6,tp_src=401,tp_dst=501
> -])
>  AT_CHECK([expr_to_flow 'inport == "eth0" && inport == "eth1"'], [0], [dnl
>  (no flows)
>  ])
> @@ -709,14 +693,6 @@ reg15=0x11
>  reg15=0x12
>  reg15=0x13
>  ])
> -AT_CHECK([expr_to_flow 'outport == @pg1 && ip4.src == {10.0.0.4,
> 10.0.0.5}'], [0], [dnl
> -ip,reg15=0x11,nw_src=10.0.0.4
> -ip,reg15=0x11,nw_src=10.0.0.5
> -ip,reg15=0x12,nw_src=10.0.0.4
> -ip,reg15=0x12,nw_src=10.0.0.5
> -ip,reg15=0x13,nw_src=10.0.0.4
> -ip,reg15=0x13,nw_src=10.0.0.5
> -])
>  AT_CHECK([expr_to_flow 'outport == {@pg_empty}'], [0], [dnl
>  (no flows)
>  ])
> @@ -725,27 +701,22 @@ reg15=0x11
>  ])
>  AT_CLEANUP
>
> -AT_SETUP([ovn -- converting expressions to flows -- no conjunction])
> -AT_KEYWORDS([no conjunction])
> +AT_SETUP([ovn -- converting expressions to flows -- conjunction])
> +AT_KEYWORDS([conjunction])
>  expr_to_flow () {
>  echo "$1" | ovstest test-ovn expr-to-flows | sort
>  }
>
> -# conjunction is disabled in OVN until some of the issues
> -# related to conjunction flows are fixed.
> -# expr-to-flows should not generate any conjunction flows.
>  lflow="ip4.src == {10.0.0.1, 10.0.0.2, 10.0.0.3} && \
>  ip4.dst == {20.0.0.1, 20.0.0.2, 20.0.0.3}"
>  AT_CHECK([expr_to_flow "$lflow"], [0], [dnl
> -ip,nw_src=10.0.0.1,nw_dst=20.0.0.1
> -ip,nw_src=10.0.0.1,nw_dst=20.0.0.2
> -ip,nw_src=10.0.0.1,nw_dst=20.0.0.3
> -ip,nw_src=10.0.0.2,nw_dst=20.0.0.1
> -ip,nw_src=10.0.0.2,nw_dst=20.0.0.2
> -ip,nw_src=10.0.0.2,nw_dst=20.0.0.3
> -ip,nw_src=10.0.0.3,nw_dst=20.0.0.1
> -ip,nw_src=10.0.0.3,nw_dst=20.0.0.2
> -ip,nw_src=10.0.0.3,nw_dst=20.0.0.3
> +conj_id=1,ip
> +ip,nw_dst=20.0.0.1: conjunction(1, 0/2)
> +ip,nw_dst=20.0.0.2: conjunction(1, 0/2)
> +ip,nw_dst=20.0.0.3: conjunction(1, 0/2)
> +ip,nw_src=10.0.0.1: 

Re: [ovs-dev] Travis build failures due to base image change to Xenial.

2019-10-28 Thread Ben Pfaff
On Mon, Oct 28, 2019 at 06:24:44PM +0100, Ilya Maximets wrote:
> On 28.10.2019 18:05, Ben Pfaff wrote:
> > On Mon, Oct 28, 2019 at 03:04:56PM +0100, Ilya Maximets wrote:
> > > On 09.08.2019 12:23, Ilya Maximets wrote:
> > > > On 01.08.2019 20:48, Aaron Conole wrote:
> > > > > Ilya Maximets  writes:
> > > > > > However, there is an additional issue with branch-2.5:
> > > > > > 
> > > > > > branch-2.5 has kernel 2.6.32 in the .travis.yml, but gcc >= 5 is not
> > > > > > able to build this kernel. This kernel reached its EOL few years ago
> > > > > > already and will never be fixed. So, there are few options for this 
> > > > > > issue:
> > > > > > 
> > > > > > 1. Simply remove 2.6.32 kernel from the build matrix.
> > > > > >  (I have a simple patch for this.)
> > > > 
> > > > For branch-2.5 I'm waiting for more comments/Ack on the patch:
> > > > https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361422.html
> > > > 
> > > > Backports to this branch will go along with applying above patch.
> > > 
> > > Hi Ben,
> > > 
> > > This discussion happened while you were out of office, but I'd like to
> > > hear your opinion on this issue with branch-2.5 that still exists.
> > 
> > My initial thought is that we don't have a clear obligation to support
> > an end-of-lifed upstream kernel.  I haven't read the whole thread to see
> > whether there's a reason to do so anyway, but if not then removing
> > 2.6.32 from the build matrix seems reasonable.
> 
> OK. So, can I treat this as an ACK for above mentioned patch?

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovndb-servers.ocf: Change from 'openvswitch' to 'ovn' in default vars.

2019-10-28 Thread Ben Pfaff
On Mon, Oct 28, 2019 at 10:25:34AM -0700, Ben Pfaff wrote:
> On Mon, Oct 28, 2019 at 08:04:15PM +0530, num...@ovn.org wrote:
> > From: Numan Siddique 
> > 
> > CC: Aliasgar Ginwala 
> > Signed-off-by: Numan Siddique 
> 
> Should we introduce something into the distro packaging, at the same
> time, to migrate from the old to the new locations automatically?

Or, alternatively, should the script tolerate finding these in the old
location?

Should we add something to the ovn upgrade guide talking about the
migration?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovndb-servers.ocf: Change from 'openvswitch' to 'ovn' in default vars.

2019-10-28 Thread Ben Pfaff
On Mon, Oct 28, 2019 at 08:04:15PM +0530, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> CC: Aliasgar Ginwala 
> Signed-off-by: Numan Siddique 

Should we introduce something into the distro packaging, at the same
time, to migrate from the old to the new locations automatically?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Travis build failures due to base image change to Xenial.

2019-10-28 Thread Ilya Maximets

On 28.10.2019 18:05, Ben Pfaff wrote:

On Mon, Oct 28, 2019 at 03:04:56PM +0100, Ilya Maximets wrote:

On 09.08.2019 12:23, Ilya Maximets wrote:

On 01.08.2019 20:48, Aaron Conole wrote:

Ilya Maximets  writes:

However, there is an additional issue with branch-2.5:

branch-2.5 has kernel 2.6.32 in the .travis.yml, but gcc >= 5 is not
able to build this kernel. This kernel reached its EOL few years ago
already and will never be fixed. So, there are few options for this issue:

1. Simply remove 2.6.32 kernel from the build matrix.
 (I have a simple patch for this.)


For branch-2.5 I'm waiting for more comments/Ack on the patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361422.html

Backports to this branch will go along with applying above patch.


Hi Ben,

This discussion happened while you were out of office, but I'd like to
hear your opinion on this issue with branch-2.5 that still exists.


My initial thought is that we don't have a clear obligation to support
an end-of-lifed upstream kernel.  I haven't read the whole thread to see
whether there's a reason to do so anyway, but if not then removing
2.6.32 from the build matrix seems reasonable.


OK. So, can I treat this as an ACK for above mentioned patch?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Travis build failures due to base image change to Xenial.

2019-10-28 Thread Ben Pfaff
On Mon, Oct 28, 2019 at 03:04:56PM +0100, Ilya Maximets wrote:
> On 09.08.2019 12:23, Ilya Maximets wrote:
> > On 01.08.2019 20:48, Aaron Conole wrote:
> > > Ilya Maximets  writes:
> > > > However, there is an additional issue with branch-2.5:
> > > > 
> > > > branch-2.5 has kernel 2.6.32 in the .travis.yml, but gcc >= 5 is not
> > > > able to build this kernel. This kernel reached its EOL few years ago
> > > > already and will never be fixed. So, there are few options for this 
> > > > issue:
> > > > 
> > > > 1. Simply remove 2.6.32 kernel from the build matrix.
> > > > (I have a simple patch for this.)
> > 
> > For branch-2.5 I'm waiting for more comments/Ack on the patch:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361422.html
> > 
> > Backports to this branch will go along with applying above patch.
> 
> Hi Ben,
> 
> This discussion happened while you were out of office, but I'd like to
> hear your opinion on this issue with branch-2.5 that still exists.

My initial thought is that we don't have a clear obligation to support
an end-of-lifed upstream kernel.  I haven't read the whole thread to see
whether there's a reason to do so anyway, but if not then removing
2.6.32 from the build matrix seems reasonable.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovndb-servers.ocf: Change from 'openvswitch' to 'ovn' in default vars.

2019-10-28 Thread Ginwala, Aliasgar via dev
Thanks Numan.

Acked-by: Aliasgar Ginwala mailto:num...@ovn.org>>

From: num...@ovn.org 
Sent: Monday, October 28, 2019 7:34 AM
To: d...@openvswitch.org 
Cc: Numan Siddique ; Ginwala, Aliasgar 
Subject: [PATCH ovn] ovndb-servers.ocf: Change from 'openvswitch' to 'ovn' in 
default vars.

From: Numan Siddique 

CC: Aliasgar Ginwala 
Signed-off-by: Numan Siddique 
---
 utilities/ovndb-servers.ocf | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/utilities/ovndb-servers.ocf b/utilities/ovndb-servers.ocf
index cd4742668..5fe7849ab 100755
--- a/utilities/ovndb-servers.ocf
+++ b/utilities/ovndb-servers.ocf
@@ -2,7 +2,7 @@

 : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
 . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
-: ${OVN_CTL_DEFAULT="/usr/share/openvswitch/scripts/ovn-ctl"}
+: ${OVN_CTL_DEFAULT="/usr/share/ovn/scripts/ovn-ctl"}
 : ${NB_MASTER_PORT_DEFAULT="6641"}
 : ${NB_MASTER_PROTO_DEFAULT="tcp"}
 : ${SB_MASTER_PORT_DEFAULT="6642"}
@@ -10,12 +10,12 @@
 : ${MANAGE_NORTHD_DEFAULT="no"}
 : ${INACTIVE_PROBE_DEFAULT="5000"}
 : ${LISTEN_ON_MASTER_IP_ONLY_DEFAULT="yes"}
-: ${NB_SSL_KEY_DEFAULT="/etc/openvswitch/ovnnb-privkey.pem"}
-: ${NB_SSL_CERT_DEFAULT="/etc/openvswitch/ovnnb-cert.pem"}
-: ${NB_SSL_CACERT_DEFAULT="/etc/openvswitch/cacert.pem"}
-: ${SB_SSL_KEY_DEFAULT="/etc/openvswitch/ovnsb-privkey.pem"}
-: ${SB_SSL_CERT_DEFAULT="/etc/openvswitch/ovnsb-cert.pem"}
-: ${SB_SSL_CACERT_DEFAULT="/etc/openvswitch/cacert.pem"}
+: ${NB_SSL_KEY_DEFAULT="/etc/ovn/ovnnb-privkey.pem"}
+: ${NB_SSL_CERT_DEFAULT="/etc/ovn/ovnnb-cert.pem"}
+: ${NB_SSL_CACERT_DEFAULT="/etc/ovn/cacert.pem"}
+: ${SB_SSL_KEY_DEFAULT="/etc/ovn/ovnsb-privkey.pem"}
+: ${SB_SSL_CERT_DEFAULT="/etc/ovn/ovnsb-cert.pem"}
+: ${SB_SSL_CACERT_DEFAULT="/etc/ovn/cacert.pem"}

 CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot"
 CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config --name 
OVN_REPL_INFO -s ovn_ovsdb_master_server"
--
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix time delta overflow in case of race for meter lock.

2019-10-28 Thread Ilya Maximets

On 25.10.2019 17:55, William Tu wrote:

On Fri, Oct 25, 2019 at 4:44 AM Ilya Maximets  wrote:


There is a race window between getting the time and getting the meter
lock.  This could lead to situation where the thread with larger
current time (this thread called time_{um}sec() later than others)
will acquire meter lock first and update meter->used to the large
value.  Next threads will try to calculate time delta by subtracting
the large meter->used from their lower time getting the negative value
which will be converted to a big unsigned delta.

Fix that by assuming that all these threads received packets in the
same time in this case, i.e. dropping negative delta to 0.

CC: Jarno Rajahalme 
Fixes: 4b27db644a8c ("dpif-netdev: Simple DROP meter implementation.")
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363126.html
Signed-off-by: Ilya Maximets 
---


LGTM.
Thanks for the fix
Acked-by: William Tu 



Thanks, William! Applied to master and backported.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Do not mix recirculation depth into RSS hash itself.

2019-10-28 Thread Ilya Maximets

On 24.10.2019 15:36, Jan Scheurich wrote:

Even simpler solution to the problem.
Acked-by: Jan Scheurich 

BR, Jan


-Original Message-
From: Ilya Maximets 
Sent: Thursday, 24 October, 2019 14:32
To: ovs-dev@openvswitch.org
Cc: Ian Stokes ; Kevin Traynor ;
Jan Scheurich ; ychen103...@163.com; Ilya
Maximets 
Subject: [PATCH] dpif-netdev: Do not mix recirculation depth into RSS hash
itself.

Mixing of RSS hash with recirculation depth is useful for flow lookup because
same packet after recirculation should match with different datapath rule.
Setting of the mixed value back to the packet is completely unnecessary
because recirculation depth is different on each recirculation, i.e. we will 
have
different packet hash for flow lookup anyway.

This should fix the issue that packets from the same flow could be directed to
different buckets based on a dp_hash or different ports of a balanced bonding
in case they were recirculated different number of times (e.g. due to conntrack
rules).
With this change, the original RSS hash will remain the same making it possible
to calculate equal dp_hash values for such packets.

Reported-at: https://protect2.fireeye.com/v1/url?k=0a51a6c3-56db840c-
0a51e658-0cc47ad93ea4-b7f1e9be8f7bbef8=1=c9f55798-de3c-45f4-afeb-
9a87d3d594ca=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-
dev%2F2019-September%2F363127.html
Fixes: 048963aa8507 ("dpif-netdev: Reset RSS hash when recirculating.")
Signed-off-by: Ilya Maximets 
---
  lib/dpif-netdev.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 4546b55e8..c09b8fd95
100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -6288,7 +6288,6 @@ dpif_netdev_packet_get_rss_hash(struct dp_packet
*packet,
  recirc_depth = *recirc_depth_get_unsafe();
  if (OVS_UNLIKELY(recirc_depth)) {
  hash = hash_finish(hash, recirc_depth);
-dp_packet_set_rss_hash(packet, hash);
  }
  return hash;
  }
--
2.17.1




Thanks, Jan! Applied to master and backported.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] Limit the number of generated conjunctions.

2019-10-28 Thread Dumitru Ceara
On Mon, Oct 28, 2019 at 5:29 PM Dumitru Ceara  wrote:
>
> On Mon, Oct 28, 2019 at 5:07 PM Mark Michelson  wrote:
> >
> > On 10/28/19 10:37 AM, Dumitru Ceara wrote:
> > > On Mon, Oct 28, 2019 at 1:46 PM Mark Michelson  
> > > wrote:
> > >>
> > >> On 10/28/19 7:46 AM, Dumitru Ceara wrote:
> > >>> On Fri, Oct 25, 2019 at 11:07 PM Mark Michelson  
> > >>> wrote:
> > 
> >  There is a maximum number of resubmits that can occur during packet
> >  processing. Deliberately creating a flow table that might cross this
> >  threshold is irresponsible.
> > 
> >  This commit causes the ofctrl code to track the maximum width and depth
> >  of conjunctions in the desired flow table. By multiplying them, we have
> >  a possible worst case for number of resubmits required. If this number
> >  exceeds a quarter of the maximum resubmits allowed, then we fall back 
> >  to
> >  forcing crossproducts.
> > 
> >  The resulting flow table can end up being a mixture of conjunction and
> >  non-conjunction flows if the limit is exceeded.
> > 
> >  Signed-off-by: Mark Michelson 
> > >>>
> > >>> Hi Mark,
> > >>>
> > >>> I've been testing the series on a setup with:
> > >>> - 100 logical routers connected to a logical gateway router
> > >>> - 100 logical switches (each connected to a logical router)
> > >>> - 200 VIFs (2 per logical switch)
> > >>> - 2 NAT rules on each router
> > >>>
> > >>> I bound all VIFs to OVS internal interfaces on the machine when
> > >>> ovn-northd is running.
> > >>>
> > >>> If I start ovn-controller on the same node, without your series I see
> > >>> ovn-controller taking more than 100s for a single flow computation
> > >>> run.
> > >>>
> > >>> If I apply your series, the maximum duration of a flow computation run
> > >>> in the same scenario drops to ~4s.
> > >>>
> > >>> This is really nice, however, I see some issues with flow removal (see 
> > >>> below).
> > >>
> > >> Thanks for the test, Dumitru.
> > >>
> > >>>
> >  ---
> > controller/lflow.c  | 11 +
> > controller/ofctrl.c | 69 
> >  +
> > controller/ofctrl.h |  6 +
> > include/ovn/expr.h  |  2 ++
> > lib/expr.c  |  6 +
> > 5 files changed, 94 insertions(+)
> > 
> >  diff --git a/controller/lflow.c b/controller/lflow.c
> >  index 34b7c36a6..318066465 100644
> >  --- a/controller/lflow.c
> >  +++ b/controller/lflow.c
> >  @@ -37,6 +37,13 @@
> > VLOG_DEFINE_THIS_MODULE(lflow);
> > 
> > COVERAGE_DEFINE(lflow_run);
> >  +
> >  +/* Limit maximum conjunctions to a quarter of the max
> >  + * resubmits. Unfortunatley, max resubmits is not publicly
> >  + * defined in a header file, so if max resubmits is changed
> >  + * from 4096, we have to make sure to update this as well
> >  + */
> >  +#define MAX_CONJUNCTIONS (4096 / 4)
> > 
> > /* Symbol table. */
> > 
> >  @@ -602,6 +609,10 @@ consider_logical_flow(
> > struct expr *prereqs;
> > char *error;
> > 
> >  +uint32_t conj_worst_case;
> >  +conj_worst_case = flow_table->max_conj_width * 
> >  flow_table->max_conj_depth;
> >  +expr_set_force_crossproduct(conj_worst_case >= MAX_CONJUNCTIONS);
> >  +
> > error = ovnacts_parse_string(lflow->actions, , , 
> >  );
> > if (error) {
> > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 
> >  1);
> >  diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >  index afb0036f1..2b2fde3c9 100644
> >  --- a/controller/ofctrl.c
> >  +++ b/controller/ofctrl.c
> >  @@ -686,6 +686,64 @@ ofctrl_check_and_add_flow(struct 
> >  ovn_desired_flow_table *flow_table,
> >   f->uuid_hindex_node.hash);
> > }
> > 
> >  +static void
> >  +get_conjunction_dimensions(const struct ovn_flow *f, uint32_t 
> >  *conj_width_p,
> >  +   uint32_t *conj_depth_p)
> >  +{
> >  +struct ofpact *ofpact;
> >  +uint32_t conj_width = 0;
> >  +uint32_t conj_depth = 0;
> >  +OFPACT_FOR_EACH (ofpact, f->ofpacts, f->ofpacts_len) {
> >  +if (ofpact->type == OFPACT_CONJUNCTION) {
> >  +struct ofpact_conjunction *conj = 
> >  ofpact_get_CONJUNCTION(ofpact);
> >  +if (conj->n_clauses > conj_depth) {
> >  +conj_depth = conj->n_clauses;
> >  +}
> >  +conj_width++;
> >  +}
> >  +}
> >  +
> >  +*conj_width_p = conj_width;
> >  +*conj_depth_p = conj_depth;
> >  +}
> >  +
> >  +static void
> >  +adjust_conjunction_count(struct ovn_desired_flow_table *desired_flows,
> >  + const struct ovn_flow *f, bool add)
> > 

Re: [ovs-dev] [PATCH 3/3] Limit the number of generated conjunctions.

2019-10-28 Thread Dumitru Ceara
On Mon, Oct 28, 2019 at 5:07 PM Mark Michelson  wrote:
>
> On 10/28/19 10:37 AM, Dumitru Ceara wrote:
> > On Mon, Oct 28, 2019 at 1:46 PM Mark Michelson  wrote:
> >>
> >> On 10/28/19 7:46 AM, Dumitru Ceara wrote:
> >>> On Fri, Oct 25, 2019 at 11:07 PM Mark Michelson  
> >>> wrote:
> 
>  There is a maximum number of resubmits that can occur during packet
>  processing. Deliberately creating a flow table that might cross this
>  threshold is irresponsible.
> 
>  This commit causes the ofctrl code to track the maximum width and depth
>  of conjunctions in the desired flow table. By multiplying them, we have
>  a possible worst case for number of resubmits required. If this number
>  exceeds a quarter of the maximum resubmits allowed, then we fall back to
>  forcing crossproducts.
> 
>  The resulting flow table can end up being a mixture of conjunction and
>  non-conjunction flows if the limit is exceeded.
> 
>  Signed-off-by: Mark Michelson 
> >>>
> >>> Hi Mark,
> >>>
> >>> I've been testing the series on a setup with:
> >>> - 100 logical routers connected to a logical gateway router
> >>> - 100 logical switches (each connected to a logical router)
> >>> - 200 VIFs (2 per logical switch)
> >>> - 2 NAT rules on each router
> >>>
> >>> I bound all VIFs to OVS internal interfaces on the machine when
> >>> ovn-northd is running.
> >>>
> >>> If I start ovn-controller on the same node, without your series I see
> >>> ovn-controller taking more than 100s for a single flow computation
> >>> run.
> >>>
> >>> If I apply your series, the maximum duration of a flow computation run
> >>> in the same scenario drops to ~4s.
> >>>
> >>> This is really nice, however, I see some issues with flow removal (see 
> >>> below).
> >>
> >> Thanks for the test, Dumitru.
> >>
> >>>
>  ---
> controller/lflow.c  | 11 +
> controller/ofctrl.c | 69 
>  +
> controller/ofctrl.h |  6 +
> include/ovn/expr.h  |  2 ++
> lib/expr.c  |  6 +
> 5 files changed, 94 insertions(+)
> 
>  diff --git a/controller/lflow.c b/controller/lflow.c
>  index 34b7c36a6..318066465 100644
>  --- a/controller/lflow.c
>  +++ b/controller/lflow.c
>  @@ -37,6 +37,13 @@
> VLOG_DEFINE_THIS_MODULE(lflow);
> 
> COVERAGE_DEFINE(lflow_run);
>  +
>  +/* Limit maximum conjunctions to a quarter of the max
>  + * resubmits. Unfortunatley, max resubmits is not publicly
>  + * defined in a header file, so if max resubmits is changed
>  + * from 4096, we have to make sure to update this as well
>  + */
>  +#define MAX_CONJUNCTIONS (4096 / 4)
> 
> /* Symbol table. */
> 
>  @@ -602,6 +609,10 @@ consider_logical_flow(
> struct expr *prereqs;
> char *error;
> 
>  +uint32_t conj_worst_case;
>  +conj_worst_case = flow_table->max_conj_width * 
>  flow_table->max_conj_depth;
>  +expr_set_force_crossproduct(conj_worst_case >= MAX_CONJUNCTIONS);
>  +
> error = ovnacts_parse_string(lflow->actions, , , 
>  );
> if (error) {
> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>  diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>  index afb0036f1..2b2fde3c9 100644
>  --- a/controller/ofctrl.c
>  +++ b/controller/ofctrl.c
>  @@ -686,6 +686,64 @@ ofctrl_check_and_add_flow(struct 
>  ovn_desired_flow_table *flow_table,
>   f->uuid_hindex_node.hash);
> }
> 
>  +static void
>  +get_conjunction_dimensions(const struct ovn_flow *f, uint32_t 
>  *conj_width_p,
>  +   uint32_t *conj_depth_p)
>  +{
>  +struct ofpact *ofpact;
>  +uint32_t conj_width = 0;
>  +uint32_t conj_depth = 0;
>  +OFPACT_FOR_EACH (ofpact, f->ofpacts, f->ofpacts_len) {
>  +if (ofpact->type == OFPACT_CONJUNCTION) {
>  +struct ofpact_conjunction *conj = 
>  ofpact_get_CONJUNCTION(ofpact);
>  +if (conj->n_clauses > conj_depth) {
>  +conj_depth = conj->n_clauses;
>  +}
>  +conj_width++;
>  +}
>  +}
>  +
>  +*conj_width_p = conj_width;
>  +*conj_depth_p = conj_depth;
>  +}
>  +
>  +static void
>  +adjust_conjunction_count(struct ovn_desired_flow_table *desired_flows,
>  + const struct ovn_flow *f, bool add)
>  +{
>  +uint32_t conj_width;
>  +uint32_t conj_depth;
>  +
>  +get_conjunction_dimensions(f, _width, _depth);
>  +
>  +if (add) {
>  +if (conj_width > desired_flows->max_conj_width) {
>  +desired_flows->max_conj_width = conj_width;
>  +}
>  +   

Re: [ovs-dev] [PATCH 2/3] Combine conjunctions with identical matches into one flow.

2019-10-28 Thread Numan Siddique
On Mon, Oct 28, 2019, 9:29 PM Mark Michelson  wrote:

> On 10/28/19 11:33 AM, Numan Siddique wrote:
> > On Sat, Oct 26, 2019 at 2:37 AM Mark Michelson 
> wrote:
> >>
> >> As stated in previous commits, conjunctive matches have an issue where
> >> it is possible to install multiple flows that have identical matches.
> >> This results in ambiguity, and can lead to features (such as ACLs) not
> >> functioning properly.
> >>
> >> This change fixes the problem by combining conjunctions with identical
> >> matches into a single flow. As an example, in the past we may have had
> >> something like:
> >>
> >> nw_dst=10.0.0.1 actions=conjunction(2,1/2)
> >> nw_dst=10.0.0.1 actions=conjunction(3,1/2)
> >>
> >> This commit changes this into
> >>
> >> nw_dst=10.0.0.1 actions=conjunction(2,1/2),conjunction(3,1/2)
> >>
> >> This way, there is only a single flow with the proscribed match, and
> >> there is no ambiguity.
> >>
> >> Signed-off-by: Mark Michelson 
> >> ---
> >>   controller/lflow.c  |  5 ++--
> >>   controller/ofctrl.c | 73
> +
> >>   controller/ofctrl.h |  6 +
> >>   tests/ovn.at| 17 +
> >>   4 files changed, 79 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/controller/lflow.c b/controller/lflow.c
> >> index e3ed20cd4..34b7c36a6 100644
> >> --- a/controller/lflow.c
> >> +++ b/controller/lflow.c
> >> @@ -736,8 +736,9 @@ consider_logical_flow(
> >>   dst->clause = src->clause;
> >>   dst->n_clauses = src->n_clauses;
> >>   }
> >> -ofctrl_add_flow(flow_table, ptable, lflow->priority, 0,
> >match,
> >> -, >header_.uuid);
> >> +
> >> +ofctrl_add_or_append_flow(flow_table, ptable,
> lflow->priority, 0,
> >> +  >match, ,
> >header_.uuid);
> >>   ofpbuf_uninit();
> >>   }
> >>   }
> >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >> index 3131baff0..afb0036f1 100644
> >> --- a/controller/ofctrl.c
> >> +++ b/controller/ofctrl.c
> >> @@ -69,6 +69,11 @@ struct ovn_flow {
> >>   uint64_t cookie;
> >>   };
> >>
> >> +static struct ovn_flow *ovn_flow_alloc(uint8_t table_id, uint16_t
> priority,
> >> +   uint64_t cookie,
> >> +   const struct match *match,
> >> +   const struct ofpbuf *actions,
> >> +   const struct uuid *sb_uuid);
> >>   static uint32_t ovn_flow_match_hash(const struct ovn_flow *);
> >>   static struct ovn_flow *ovn_flow_lookup(struct hmap *flow_table,
> >>   const struct ovn_flow *target,
> >> @@ -657,16 +662,8 @@ ofctrl_check_and_add_flow(struct
> ovn_desired_flow_table *flow_table,
> >> const struct uuid *sb_uuid,
> >> bool log_duplicate_flow)
> >>   {
> >> -struct ovn_flow *f = xmalloc(sizeof *f);
> >> -f->table_id = table_id;
> >> -f->priority = priority;
> >> -minimatch_init(>match, match);
> >> -f->ofpacts = xmemdup(actions->data, actions->size);
> >> -f->ofpacts_len = actions->size;
> >> -f->sb_uuid = *sb_uuid;
> >> -f->match_hmap_node.hash = ovn_flow_match_hash(f);
> >> -f->uuid_hindex_node.hash = uuid_hash(>sb_uuid);
> >> -f->cookie = cookie;
> >> +struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie,
> match,
> >> +actions, sb_uuid);
> >>
> >>   ovn_flow_log(f, "ofctrl_add_flow");
> >>
> >> @@ -721,9 +718,65 @@ ofctrl_add_flow(struct ovn_desired_flow_table
> *desired_flows,
> >>   ofctrl_check_and_add_flow(desired_flows, table_id, priority,
> cookie,
> >> match, actions, sb_uuid, true);
> >>   }
> >> +
> >> +void
> >> +ofctrl_add_or_append_flow(struct ovn_desired_flow_table *desired_flows,
> >> +  uint8_t table_id, uint16_t priority,
> uint64_t cookie,
> >> +  const struct match *match,
> >> +  const struct ofpbuf *actions,
> >> +  const struct uuid *sb_uuid)
> >> +{
> >> +struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie,
> match,
> >> +actions, sb_uuid);
> >> +
> >> +ovn_flow_log(f, "ofctrl_add_or_append_flow");
> >> +
> >> +struct ovn_flow *existing;
> >> +existing = ovn_flow_lookup(_flows->match_flow_table, f,
> false);
> >> +if (existing) {
> >> +/* There's already a flow with this particular match. Append
> the
> >> + * action to that flow rather than adding a new flow
> >> + */
> >> +uint64_t compound_stub[64 / 8];
> >> +struct ofpbuf compound;
> >> +ofpbuf_use_stub(, compound_stub,
> sizeof(compound_stub));
> >> +ofpbuf_put(, existing->ofpacts,
> 

Re: [ovs-dev] [PATCH 3/3] Limit the number of generated conjunctions.

2019-10-28 Thread Mark Michelson

On 10/28/19 10:37 AM, Dumitru Ceara wrote:

On Mon, Oct 28, 2019 at 1:46 PM Mark Michelson  wrote:


On 10/28/19 7:46 AM, Dumitru Ceara wrote:

On Fri, Oct 25, 2019 at 11:07 PM Mark Michelson  wrote:


There is a maximum number of resubmits that can occur during packet
processing. Deliberately creating a flow table that might cross this
threshold is irresponsible.

This commit causes the ofctrl code to track the maximum width and depth
of conjunctions in the desired flow table. By multiplying them, we have
a possible worst case for number of resubmits required. If this number
exceeds a quarter of the maximum resubmits allowed, then we fall back to
forcing crossproducts.

The resulting flow table can end up being a mixture of conjunction and
non-conjunction flows if the limit is exceeded.

Signed-off-by: Mark Michelson 


Hi Mark,

I've been testing the series on a setup with:
- 100 logical routers connected to a logical gateway router
- 100 logical switches (each connected to a logical router)
- 200 VIFs (2 per logical switch)
- 2 NAT rules on each router

I bound all VIFs to OVS internal interfaces on the machine when
ovn-northd is running.

If I start ovn-controller on the same node, without your series I see
ovn-controller taking more than 100s for a single flow computation
run.

If I apply your series, the maximum duration of a flow computation run
in the same scenario drops to ~4s.

This is really nice, however, I see some issues with flow removal (see below).


Thanks for the test, Dumitru.




---
   controller/lflow.c  | 11 +
   controller/ofctrl.c | 69 
+
   controller/ofctrl.h |  6 +
   include/ovn/expr.h  |  2 ++
   lib/expr.c  |  6 +
   5 files changed, 94 insertions(+)

diff --git a/controller/lflow.c b/controller/lflow.c
index 34b7c36a6..318066465 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -37,6 +37,13 @@
   VLOG_DEFINE_THIS_MODULE(lflow);

   COVERAGE_DEFINE(lflow_run);
+
+/* Limit maximum conjunctions to a quarter of the max
+ * resubmits. Unfortunatley, max resubmits is not publicly
+ * defined in a header file, so if max resubmits is changed
+ * from 4096, we have to make sure to update this as well
+ */
+#define MAX_CONJUNCTIONS (4096 / 4)

   /* Symbol table. */

@@ -602,6 +609,10 @@ consider_logical_flow(
   struct expr *prereqs;
   char *error;

+uint32_t conj_worst_case;
+conj_worst_case = flow_table->max_conj_width * flow_table->max_conj_depth;
+expr_set_force_crossproduct(conj_worst_case >= MAX_CONJUNCTIONS);
+
   error = ovnacts_parse_string(lflow->actions, , , );
   if (error) {
   static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index afb0036f1..2b2fde3c9 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -686,6 +686,64 @@ ofctrl_check_and_add_flow(struct ovn_desired_flow_table 
*flow_table,
 f->uuid_hindex_node.hash);
   }

+static void
+get_conjunction_dimensions(const struct ovn_flow *f, uint32_t *conj_width_p,
+   uint32_t *conj_depth_p)
+{
+struct ofpact *ofpact;
+uint32_t conj_width = 0;
+uint32_t conj_depth = 0;
+OFPACT_FOR_EACH (ofpact, f->ofpacts, f->ofpacts_len) {
+if (ofpact->type == OFPACT_CONJUNCTION) {
+struct ofpact_conjunction *conj = ofpact_get_CONJUNCTION(ofpact);
+if (conj->n_clauses > conj_depth) {
+conj_depth = conj->n_clauses;
+}
+conj_width++;
+}
+}
+
+*conj_width_p = conj_width;
+*conj_depth_p = conj_depth;
+}
+
+static void
+adjust_conjunction_count(struct ovn_desired_flow_table *desired_flows,
+ const struct ovn_flow *f, bool add)
+{
+uint32_t conj_width;
+uint32_t conj_depth;
+
+get_conjunction_dimensions(f, _width, _depth);
+
+if (add) {
+if (conj_width > desired_flows->max_conj_width) {
+desired_flows->max_conj_width = conj_width;
+}
+if (conj_depth > desired_flows->max_conj_depth) {
+desired_flows->max_conj_depth = conj_depth;
+}
+} else if (desired_flows->max_conj_width <= conj_width ||
+   desired_flows->max_conj_depth <= conj_depth) {
+/* Removing either the widest or deepest conjunction.
+ * Walk the table and recalculate maximums
+ */
+struct ovn_flow *iter;
+desired_flows->max_conj_width = 0;
+desired_flows->max_conj_depth = 0;
+HMAP_FOR_EACH (iter, match_hmap_node,
+   _flows->match_flow_table) {
+get_conjunction_dimensions(iter, _width, _depth);
+if (conj_width > desired_flows->max_conj_width) {
+desired_flows->max_conj_width = conj_width;
+}
+if (conj_depth > desired_flows->max_conj_depth) {
+desired_flows->max_conj_depth = 

Re: [ovs-dev] [PATCH 2/3] Combine conjunctions with identical matches into one flow.

2019-10-28 Thread Mark Michelson

On 10/28/19 11:33 AM, Numan Siddique wrote:

On Sat, Oct 26, 2019 at 2:37 AM Mark Michelson  wrote:


As stated in previous commits, conjunctive matches have an issue where
it is possible to install multiple flows that have identical matches.
This results in ambiguity, and can lead to features (such as ACLs) not
functioning properly.

This change fixes the problem by combining conjunctions with identical
matches into a single flow. As an example, in the past we may have had
something like:

nw_dst=10.0.0.1 actions=conjunction(2,1/2)
nw_dst=10.0.0.1 actions=conjunction(3,1/2)

This commit changes this into

nw_dst=10.0.0.1 actions=conjunction(2,1/2),conjunction(3,1/2)

This way, there is only a single flow with the proscribed match, and
there is no ambiguity.

Signed-off-by: Mark Michelson 
---
  controller/lflow.c  |  5 ++--
  controller/ofctrl.c | 73 +
  controller/ofctrl.h |  6 +
  tests/ovn.at| 17 +
  4 files changed, 79 insertions(+), 22 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index e3ed20cd4..34b7c36a6 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -736,8 +736,9 @@ consider_logical_flow(
  dst->clause = src->clause;
  dst->n_clauses = src->n_clauses;
  }
-ofctrl_add_flow(flow_table, ptable, lflow->priority, 0, >match,
-, >header_.uuid);
+
+ofctrl_add_or_append_flow(flow_table, ptable, lflow->priority, 0,
+  >match, , >header_.uuid);
  ofpbuf_uninit();
  }
  }
diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 3131baff0..afb0036f1 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -69,6 +69,11 @@ struct ovn_flow {
  uint64_t cookie;
  };

+static struct ovn_flow *ovn_flow_alloc(uint8_t table_id, uint16_t priority,
+   uint64_t cookie,
+   const struct match *match,
+   const struct ofpbuf *actions,
+   const struct uuid *sb_uuid);
  static uint32_t ovn_flow_match_hash(const struct ovn_flow *);
  static struct ovn_flow *ovn_flow_lookup(struct hmap *flow_table,
  const struct ovn_flow *target,
@@ -657,16 +662,8 @@ ofctrl_check_and_add_flow(struct ovn_desired_flow_table 
*flow_table,
const struct uuid *sb_uuid,
bool log_duplicate_flow)
  {
-struct ovn_flow *f = xmalloc(sizeof *f);
-f->table_id = table_id;
-f->priority = priority;
-minimatch_init(>match, match);
-f->ofpacts = xmemdup(actions->data, actions->size);
-f->ofpacts_len = actions->size;
-f->sb_uuid = *sb_uuid;
-f->match_hmap_node.hash = ovn_flow_match_hash(f);
-f->uuid_hindex_node.hash = uuid_hash(>sb_uuid);
-f->cookie = cookie;
+struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie, match,
+actions, sb_uuid);

  ovn_flow_log(f, "ofctrl_add_flow");

@@ -721,9 +718,65 @@ ofctrl_add_flow(struct ovn_desired_flow_table 
*desired_flows,
  ofctrl_check_and_add_flow(desired_flows, table_id, priority, cookie,
match, actions, sb_uuid, true);
  }
+
+void
+ofctrl_add_or_append_flow(struct ovn_desired_flow_table *desired_flows,
+  uint8_t table_id, uint16_t priority, uint64_t cookie,
+  const struct match *match,
+  const struct ofpbuf *actions,
+  const struct uuid *sb_uuid)
+{
+struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie, match,
+actions, sb_uuid);
+
+ovn_flow_log(f, "ofctrl_add_or_append_flow");
+
+struct ovn_flow *existing;
+existing = ovn_flow_lookup(_flows->match_flow_table, f, false);
+if (existing) {
+/* There's already a flow with this particular match. Append the
+ * action to that flow rather than adding a new flow
+ */
+uint64_t compound_stub[64 / 8];
+struct ofpbuf compound;
+ofpbuf_use_stub(, compound_stub, sizeof(compound_stub));
+ofpbuf_put(, existing->ofpacts, existing->ofpacts_len);
+ofpbuf_put(, f->ofpacts, f->ofpacts_len);


Instead of making use of a stub and copying the existing and new
actions, can't we just
copy the new actions to "existing->ofpacts" using ofpbuf_put() ?


You can't use ofpbuf_put() directly since existing->ofpacts is of type 
ofpact, not ofpbuf. And I don't think you can cast existing->ofpacts to 
an ofpbuf since existing->ofpacts was created from the 'data' member of 
an ofpbuf; we don't have the metadata, such as the method by which the 
data was allocated.


So maybe it's possible to create a new ofpbuf but have it use 

Re: [ovs-dev] [PATCH 2/3] Combine conjunctions with identical matches into one flow.

2019-10-28 Thread Numan Siddique
On Sat, Oct 26, 2019 at 2:37 AM Mark Michelson  wrote:
>
> As stated in previous commits, conjunctive matches have an issue where
> it is possible to install multiple flows that have identical matches.
> This results in ambiguity, and can lead to features (such as ACLs) not
> functioning properly.
>
> This change fixes the problem by combining conjunctions with identical
> matches into a single flow. As an example, in the past we may have had
> something like:
>
> nw_dst=10.0.0.1 actions=conjunction(2,1/2)
> nw_dst=10.0.0.1 actions=conjunction(3,1/2)
>
> This commit changes this into
>
> nw_dst=10.0.0.1 actions=conjunction(2,1/2),conjunction(3,1/2)
>
> This way, there is only a single flow with the proscribed match, and
> there is no ambiguity.
>
> Signed-off-by: Mark Michelson 
> ---
>  controller/lflow.c  |  5 ++--
>  controller/ofctrl.c | 73 
> +
>  controller/ofctrl.h |  6 +
>  tests/ovn.at| 17 +
>  4 files changed, 79 insertions(+), 22 deletions(-)
>
> diff --git a/controller/lflow.c b/controller/lflow.c
> index e3ed20cd4..34b7c36a6 100644
> --- a/controller/lflow.c
> +++ b/controller/lflow.c
> @@ -736,8 +736,9 @@ consider_logical_flow(
>  dst->clause = src->clause;
>  dst->n_clauses = src->n_clauses;
>  }
> -ofctrl_add_flow(flow_table, ptable, lflow->priority, 0, 
> >match,
> -, >header_.uuid);
> +
> +ofctrl_add_or_append_flow(flow_table, ptable, lflow->priority, 0,
> +  >match, , 
> >header_.uuid);
>  ofpbuf_uninit();
>  }
>  }
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index 3131baff0..afb0036f1 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -69,6 +69,11 @@ struct ovn_flow {
>  uint64_t cookie;
>  };
>
> +static struct ovn_flow *ovn_flow_alloc(uint8_t table_id, uint16_t priority,
> +   uint64_t cookie,
> +   const struct match *match,
> +   const struct ofpbuf *actions,
> +   const struct uuid *sb_uuid);
>  static uint32_t ovn_flow_match_hash(const struct ovn_flow *);
>  static struct ovn_flow *ovn_flow_lookup(struct hmap *flow_table,
>  const struct ovn_flow *target,
> @@ -657,16 +662,8 @@ ofctrl_check_and_add_flow(struct ovn_desired_flow_table 
> *flow_table,
>const struct uuid *sb_uuid,
>bool log_duplicate_flow)
>  {
> -struct ovn_flow *f = xmalloc(sizeof *f);
> -f->table_id = table_id;
> -f->priority = priority;
> -minimatch_init(>match, match);
> -f->ofpacts = xmemdup(actions->data, actions->size);
> -f->ofpacts_len = actions->size;
> -f->sb_uuid = *sb_uuid;
> -f->match_hmap_node.hash = ovn_flow_match_hash(f);
> -f->uuid_hindex_node.hash = uuid_hash(>sb_uuid);
> -f->cookie = cookie;
> +struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie, match,
> +actions, sb_uuid);
>
>  ovn_flow_log(f, "ofctrl_add_flow");
>
> @@ -721,9 +718,65 @@ ofctrl_add_flow(struct ovn_desired_flow_table 
> *desired_flows,
>  ofctrl_check_and_add_flow(desired_flows, table_id, priority, cookie,
>match, actions, sb_uuid, true);
>  }
> +
> +void
> +ofctrl_add_or_append_flow(struct ovn_desired_flow_table *desired_flows,
> +  uint8_t table_id, uint16_t priority, uint64_t 
> cookie,
> +  const struct match *match,
> +  const struct ofpbuf *actions,
> +  const struct uuid *sb_uuid)
> +{
> +struct ovn_flow *f = ovn_flow_alloc(table_id, priority, cookie, match,
> +actions, sb_uuid);
> +
> +ovn_flow_log(f, "ofctrl_add_or_append_flow");
> +
> +struct ovn_flow *existing;
> +existing = ovn_flow_lookup(_flows->match_flow_table, f, false);
> +if (existing) {
> +/* There's already a flow with this particular match. Append the
> + * action to that flow rather than adding a new flow
> + */
> +uint64_t compound_stub[64 / 8];
> +struct ofpbuf compound;
> +ofpbuf_use_stub(, compound_stub, sizeof(compound_stub));
> +ofpbuf_put(, existing->ofpacts, existing->ofpacts_len);
> +ofpbuf_put(, f->ofpacts, f->ofpacts_len);

Instead of making use of a stub and copying the existing and new
actions, can't we just
copy the new actions to "existing->ofpacts" using ofpbuf_put() ?

ofpbuf_put() will take care of reallocating the memory if required.

Other than that, this and the 1st patch of this series, looks good to me.

Thanks
Numan

> +
> +free(existing->ofpacts);
> +existing->ofpacts = 

[ovs-dev] [PATCH v4 ovn 2/2] Add DNSSL support to OVN

2019-10-28 Thread Lorenzo Bianconi
Introduce the possibility to specify a DNSSL option to Router
Advertisement packets. DNS Search list can be specified using
'dnssl' tag in the ipv6_ra_configs column of logical router
port table

Signed-off-by: Lorenzo Bianconi 
---
 controller/pinctrl.c | 51 
 lib/ovn-l7.h | 11 ++
 northd/ovn-northd.c  |  4 
 ovn-nb.xml   |  5 +
 tests/ovn.at | 30 +-
 5 files changed, 91 insertions(+), 10 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index f105ebb5c..0bd1ef3ef 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -2194,6 +2194,7 @@ struct ipv6_ra_config {
 struct lport_addresses prefixes;
 struct in6_addr rdnss;
 bool has_rdnss;
+struct ds dnssl;
 };
 
 struct ipv6_ra_state {
@@ -2215,6 +2216,7 @@ ipv6_ra_config_delete(struct ipv6_ra_config *config)
 {
 if (config) {
 destroy_lport_addresses(>prefixes);
+ds_destroy(>dnssl);
 free(config);
 }
 }
@@ -2253,6 +2255,7 @@ ipv6_ra_update_config(const struct sbrec_port_binding *pb)
 nd_ra_min_interval_default(config->max_interval));
 config->mtu = smap_get_int(>options, "ipv6_ra_mtu", ND_MTU_DEFAULT);
 config->la_flags = IPV6_ND_RA_OPT_PREFIX_ON_LINK;
+ds_init(>dnssl);
 
 const char *address_mode = smap_get(>options, "ipv6_ra_address_mode");
 if (!address_mode) {
@@ -2298,6 +2301,11 @@ ipv6_ra_update_config(const struct sbrec_port_binding 
*pb)
 }
 config->has_rdnss = !!rdnss;
 
+const char *dnssl = smap_get(>options, "ipv6_ra_dnssl");
+if (dnssl) {
+ds_put_buffer(>dnssl, dnssl, strlen(dnssl));
+}
+
 return config;
 
 fail:
@@ -2354,6 +2362,45 @@ packet_put_ra_rdnss_opt(struct dp_packet *b, uint8_t num,
   prev_l4_size + len * 8));
 }
 
+static void
+packet_put_ra_dnssl_opt(struct dp_packet *b, ovs_be32 lifetime,
+char *dnssl_list)
+{
+char *t0, *r0 = dnssl_list, dnssl[255] = {};
+size_t prev_l4_size = dp_packet_l4_size(b);
+size_t size = sizeof(struct ovs_nd_dnssl);
+struct ip6_hdr *nh = dp_packet_l3(b);
+int i = 0;
+
+while ((t0 = strtok_r(r0, ",", ))) {
+char *t1, *r1 = t0;
+
+size += strlen(t0) + 2;
+while ((t1 = strtok_r(r1, ".", ))) {
+dnssl[i++] = strlen(t1);
+memcpy([i], t1, strlen(t1));
+i += strlen(t1);
+}
+dnssl[i++] = 0;
+}
+size = ROUND_UP(size, 8);
+nh->ip6_plen = htons(prev_l4_size + size);
+
+struct ovs_nd_dnssl *nd_dnssl = dp_packet_put_uninit(b, sizeof *nd_dnssl);
+nd_dnssl->type = ND_OPT_DNSSL;
+nd_dnssl->len = size / 8;
+nd_dnssl->reserved = 0;
+put_16aligned_be32(_dnssl->lifetime, lifetime);
+
+dp_packet_put(b, dnssl, size - sizeof *nd_dnssl);
+
+struct ovs_ra_msg *ra = dp_packet_l4(b);
+ra->icmph.icmp6_cksum = 0;
+uint32_t icmp_csum = packet_csum_pseudoheader6(dp_packet_l3(b));
+ra->icmph.icmp6_cksum = csum_finish(csum_continue(icmp_csum, ra,
+  prev_l4_size + size));
+}
+
 /* Called with in the pinctrl_handler thread context. */
 static long long int
 ipv6_ra_send(struct rconn *swconn, struct ipv6_ra_state *ra)
@@ -2382,6 +2429,10 @@ ipv6_ra_send(struct rconn *swconn, struct ipv6_ra_state 
*ra)
 packet_put_ra_rdnss_opt(, 1, htonl(0x),
 >config->rdnss);
 }
+if (ra->config->dnssl.length) {
+packet_put_ra_dnssl_opt(, htonl(0x),
+ra->config->dnssl.string);
+}
 
 uint64_t ofpacts_stub[4096 / 8];
 struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub);
diff --git a/lib/ovn-l7.h b/lib/ovn-l7.h
index 12cee6329..5fc370bf5 100644
--- a/lib/ovn-l7.h
+++ b/lib/ovn-l7.h
@@ -231,6 +231,17 @@ struct nd_rdnss_opt {
 };
 BUILD_ASSERT_DECL(ND_RDNSS_OPT_LEN == sizeof(struct nd_rdnss_opt));
 
+/* DNSSL option RFC 6106 */
+#define ND_OPT_DNSSL31
+#define ND_DNSSL_OPT_LEN8
+struct ovs_nd_dnssl {
+u_int8_t type;  /* ND_OPT_DNSSL */
+u_int8_t len;   /* >= 2 */
+ovs_be16 reserved;
+ovs_16aligned_be32 lifetime;
+};
+BUILD_ASSERT_DECL(ND_DNSSL_OPT_LEN == sizeof(struct ovs_nd_dnssl));
+
 #define DHCPV6_DUID_LL  3
 #define DHCPV6_HW_TYPE_ETH  1
 
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index d1de36e08..dbb279052 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -6489,6 +6489,10 @@ copy_ra_to_sb(struct ovn_port *op, const char 
*address_mode)
 if (rdnss) {
 smap_add(, "ipv6_ra_rdnss", rdnss);
 }
+const char *dnssl = smap_get(>nbrp->ipv6_ra_configs, "dnssl");
+if (dnssl) {
+smap_add(, "ipv6_ra_dnssl", dnssl);
+}
 
 smap_add(, "ipv6_ra_src_eth", op->lrp_networks.ea_s);
 
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 

[ovs-dev] [PATCH v4 ovn 1/2] Add RDNSS support to OVN

2019-10-28 Thread Lorenzo Bianconi
Introduce the possibility to specify a RDNSS option to Router
Advertisement packets. DNS IPv6 address can be specified using
'rdnss' tag in the ipv6_ra_configs column of logical router
port table

Acked-by: Mark Michelson 
Signed-off-by: Lorenzo Bianconi 
---
 controller/pinctrl.c | 39 +++
 lib/ovn-l7.h | 11 +++
 northd/ovn-northd.c  |  5 +
 ovn-nb.xml   |  5 +
 tests/ovn.at | 27 +++
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index d826da186..f105ebb5c 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -2192,6 +2192,8 @@ struct ipv6_ra_config {
 uint8_t mo_flags; /* Managed/Other flags for RAs */
 uint8_t la_flags; /* On-link/autonomous flags for address prefixes */
 struct lport_addresses prefixes;
+struct in6_addr rdnss;
+bool has_rdnss;
 };
 
 struct ipv6_ra_state {
@@ -2289,6 +2291,12 @@ ipv6_ra_update_config(const struct sbrec_port_binding 
*pb)
 VLOG_WARN("Invalid IP source %s", ip_addr);
 goto fail;
 }
+const char *rdnss = smap_get(>options, "ipv6_ra_rdnss");
+if (rdnss && !ipv6_parse(rdnss, >rdnss)) {
+VLOG_WARN("Invalid RDNSS source %s", rdnss);
+goto fail;
+}
+config->has_rdnss = !!rdnss;
 
 return config;
 
@@ -2319,6 +2327,33 @@ put_load(uint64_t value, enum mf_field_id dst, int ofs, 
int n_bits,
 bitwise_one(ofpact_set_field_mask(sf), sf->field->n_bytes, ofs, n_bits);
 }
 
+static void
+packet_put_ra_rdnss_opt(struct dp_packet *b, uint8_t num,
+ovs_be32 lifetime, const struct in6_addr *dns)
+{
+size_t prev_l4_size = dp_packet_l4_size(b);
+struct ip6_hdr *nh = dp_packet_l3(b);
+size_t len = 2 * num + 1;
+
+nh->ip6_plen = htons(prev_l4_size + len * 8);
+
+struct nd_rdnss_opt *nd_rdnss = dp_packet_put_uninit(b, sizeof *nd_rdnss);
+nd_rdnss->type = ND_OPT_RDNSS;
+nd_rdnss->len = len;
+nd_rdnss->reserved = 0;
+put_16aligned_be32(_rdnss->lifetime, lifetime);
+
+for (int i = 0; i < num; i++) {
+dp_packet_put(b, [i], sizeof(ovs_be32[4]));
+}
+
+struct ovs_ra_msg *ra = dp_packet_l4(b);
+ra->icmph.icmp6_cksum = 0;
+uint32_t icmp_csum = packet_csum_pseudoheader6(dp_packet_l3(b));
+ra->icmph.icmp6_cksum = csum_finish(csum_continue(icmp_csum, ra,
+  prev_l4_size + len * 8));
+}
+
 /* Called with in the pinctrl_handler thread context. */
 static long long int
 ipv6_ra_send(struct rconn *swconn, struct ipv6_ra_state *ra)
@@ -2343,6 +2378,10 @@ ipv6_ra_send(struct rconn *swconn, struct ipv6_ra_state 
*ra)
 ra->config->la_flags, htonl(IPV6_ND_RA_OPT_PREFIX_VALID_LIFETIME),
 htonl(IPV6_ND_RA_OPT_PREFIX_PREFERRED_LIFETIME), addr);
 }
+if (ra->config->has_rdnss) {
+packet_put_ra_rdnss_opt(, 1, htonl(0x),
+>config->rdnss);
+}
 
 uint64_t ofpacts_stub[4096 / 8];
 struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub);
diff --git a/lib/ovn-l7.h b/lib/ovn-l7.h
index c93def450..12cee6329 100644
--- a/lib/ovn-l7.h
+++ b/lib/ovn-l7.h
@@ -220,6 +220,17 @@ struct dhcpv6_opt_ia_na {
 ovs_be32 t2;
 });
 
+/* RDNSS option RFC 6106 */
+#define ND_RDNSS_OPT_LEN8
+#define ND_OPT_RDNSS25
+struct nd_rdnss_opt {
+uint8_t type; /* ND_OPT_RDNSS. */
+uint8_t len;  /* >= 3. */
+ovs_be16 reserved;/* Always 0. */
+ovs_16aligned_be32 lifetime;
+};
+BUILD_ASSERT_DECL(ND_RDNSS_OPT_LEN == sizeof(struct nd_rdnss_opt));
+
 #define DHCPV6_DUID_LL  3
 #define DHCPV6_HW_TYPE_ETH  1
 
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index ea8ad7c2d..d1de36e08 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -6485,6 +6485,11 @@ copy_ra_to_sb(struct ovn_port *op, const char 
*address_mode)
 smap_add(, "ipv6_ra_prefixes", ds_cstr());
 ds_destroy();
 
+const char *rdnss = smap_get(>nbrp->ipv6_ra_configs, "rdnss");
+if (rdnss) {
+smap_add(, "ipv6_ra_rdnss", rdnss);
+}
+
 smap_add(, "ipv6_ra_src_eth", op->lrp_networks.ea_s);
 
 sbrec_port_binding_set_options(op->sb, );
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 1504f8fca..2faf9390b 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -1885,6 +1885,11 @@
 is one-third of ,
 i.e. 200 seconds if that key is unset.
   
+
+  
+IPv6 address of RDNSS server announced in RA packets. At the moment
+OVN supports just one RDNSS server.
+  
 
 
 
diff --git a/tests/ovn.at b/tests/ovn.at
index 22b272a60..3e7692895 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -12537,14 +12537,15 @@ construct_expected_ra() {
 
 local mtu=$1
 local ra_mo=$2
-local ra_prefix_la=$3
+local rdnss=$3
+local ra_prefix_la=$4
 
 local slla=0101${src_mac}
 local 

Re: [ovs-dev] [PATCH 3/3] Limit the number of generated conjunctions.

2019-10-28 Thread Dumitru Ceara
On Mon, Oct 28, 2019 at 1:46 PM Mark Michelson  wrote:
>
> On 10/28/19 7:46 AM, Dumitru Ceara wrote:
> > On Fri, Oct 25, 2019 at 11:07 PM Mark Michelson  wrote:
> >>
> >> There is a maximum number of resubmits that can occur during packet
> >> processing. Deliberately creating a flow table that might cross this
> >> threshold is irresponsible.
> >>
> >> This commit causes the ofctrl code to track the maximum width and depth
> >> of conjunctions in the desired flow table. By multiplying them, we have
> >> a possible worst case for number of resubmits required. If this number
> >> exceeds a quarter of the maximum resubmits allowed, then we fall back to
> >> forcing crossproducts.
> >>
> >> The resulting flow table can end up being a mixture of conjunction and
> >> non-conjunction flows if the limit is exceeded.
> >>
> >> Signed-off-by: Mark Michelson 
> >
> > Hi Mark,
> >
> > I've been testing the series on a setup with:
> > - 100 logical routers connected to a logical gateway router
> > - 100 logical switches (each connected to a logical router)
> > - 200 VIFs (2 per logical switch)
> > - 2 NAT rules on each router
> >
> > I bound all VIFs to OVS internal interfaces on the machine when
> > ovn-northd is running.
> >
> > If I start ovn-controller on the same node, without your series I see
> > ovn-controller taking more than 100s for a single flow computation
> > run.
> >
> > If I apply your series, the maximum duration of a flow computation run
> > in the same scenario drops to ~4s.
> >
> > This is really nice, however, I see some issues with flow removal (see 
> > below).
>
> Thanks for the test, Dumitru.
>
> >
> >> ---
> >>   controller/lflow.c  | 11 +
> >>   controller/ofctrl.c | 69 
> >> +
> >>   controller/ofctrl.h |  6 +
> >>   include/ovn/expr.h  |  2 ++
> >>   lib/expr.c  |  6 +
> >>   5 files changed, 94 insertions(+)
> >>
> >> diff --git a/controller/lflow.c b/controller/lflow.c
> >> index 34b7c36a6..318066465 100644
> >> --- a/controller/lflow.c
> >> +++ b/controller/lflow.c
> >> @@ -37,6 +37,13 @@
> >>   VLOG_DEFINE_THIS_MODULE(lflow);
> >>
> >>   COVERAGE_DEFINE(lflow_run);
> >> +
> >> +/* Limit maximum conjunctions to a quarter of the max
> >> + * resubmits. Unfortunatley, max resubmits is not publicly
> >> + * defined in a header file, so if max resubmits is changed
> >> + * from 4096, we have to make sure to update this as well
> >> + */
> >> +#define MAX_CONJUNCTIONS (4096 / 4)
> >>
> >>   /* Symbol table. */
> >>
> >> @@ -602,6 +609,10 @@ consider_logical_flow(
> >>   struct expr *prereqs;
> >>   char *error;
> >>
> >> +uint32_t conj_worst_case;
> >> +conj_worst_case = flow_table->max_conj_width * 
> >> flow_table->max_conj_depth;
> >> +expr_set_force_crossproduct(conj_worst_case >= MAX_CONJUNCTIONS);
> >> +
> >>   error = ovnacts_parse_string(lflow->actions, , , 
> >> );
> >>   if (error) {
> >>   static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >> index afb0036f1..2b2fde3c9 100644
> >> --- a/controller/ofctrl.c
> >> +++ b/controller/ofctrl.c
> >> @@ -686,6 +686,64 @@ ofctrl_check_and_add_flow(struct 
> >> ovn_desired_flow_table *flow_table,
> >> f->uuid_hindex_node.hash);
> >>   }
> >>
> >> +static void
> >> +get_conjunction_dimensions(const struct ovn_flow *f, uint32_t 
> >> *conj_width_p,
> >> +   uint32_t *conj_depth_p)
> >> +{
> >> +struct ofpact *ofpact;
> >> +uint32_t conj_width = 0;
> >> +uint32_t conj_depth = 0;
> >> +OFPACT_FOR_EACH (ofpact, f->ofpacts, f->ofpacts_len) {
> >> +if (ofpact->type == OFPACT_CONJUNCTION) {
> >> +struct ofpact_conjunction *conj = 
> >> ofpact_get_CONJUNCTION(ofpact);
> >> +if (conj->n_clauses > conj_depth) {
> >> +conj_depth = conj->n_clauses;
> >> +}
> >> +conj_width++;
> >> +}
> >> +}
> >> +
> >> +*conj_width_p = conj_width;
> >> +*conj_depth_p = conj_depth;
> >> +}
> >> +
> >> +static void
> >> +adjust_conjunction_count(struct ovn_desired_flow_table *desired_flows,
> >> + const struct ovn_flow *f, bool add)
> >> +{
> >> +uint32_t conj_width;
> >> +uint32_t conj_depth;
> >> +
> >> +get_conjunction_dimensions(f, _width, _depth);
> >> +
> >> +if (add) {
> >> +if (conj_width > desired_flows->max_conj_width) {
> >> +desired_flows->max_conj_width = conj_width;
> >> +}
> >> +if (conj_depth > desired_flows->max_conj_depth) {
> >> +desired_flows->max_conj_depth = conj_depth;
> >> +}
> >> +} else if (desired_flows->max_conj_width <= conj_width ||
> >> +   desired_flows->max_conj_depth <= conj_depth) {
> >> +/* Removing either the widest or deepest conjunction.
> >> + * Walk the table and recalculate 

[ovs-dev] [PATCH v4 ovn 0/2] add RDNSS/DNSSL support to OVN

2019-10-28 Thread Lorenzo Bianconi
Changes since v3:
- rely on dp_packet_put to fill rdnss/dnssl addresses in RA packets and
  remove unused fields

Changes since v2:
- move option definition in ovn-l7.h
- use ovs_16aligned_be32 for option lifetime field

Changes since v1:
- fix sparse warnings
- rebase DNSSL patch on top of RDNSS one

Lorenzo Bianconi (2):
  Add RDNSS support to OVN
  Add DNSSL support to OVN

 controller/pinctrl.c | 90 
 lib/ovn-l7.h | 22 +++
 northd/ovn-northd.c  |  9 +
 ovn-nb.xml   | 10 +
 tests/ovn.at | 39 ++-
 5 files changed, 161 insertions(+), 9 deletions(-)

-- 
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovndb-servers.ocf: Change from 'openvswitch' to 'ovn' in default vars.

2019-10-28 Thread numans
From: Numan Siddique 

CC: Aliasgar Ginwala 
Signed-off-by: Numan Siddique 
---
 utilities/ovndb-servers.ocf | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/utilities/ovndb-servers.ocf b/utilities/ovndb-servers.ocf
index cd4742668..5fe7849ab 100755
--- a/utilities/ovndb-servers.ocf
+++ b/utilities/ovndb-servers.ocf
@@ -2,7 +2,7 @@
 
 : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
 . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
-: ${OVN_CTL_DEFAULT="/usr/share/openvswitch/scripts/ovn-ctl"}
+: ${OVN_CTL_DEFAULT="/usr/share/ovn/scripts/ovn-ctl"}
 : ${NB_MASTER_PORT_DEFAULT="6641"}
 : ${NB_MASTER_PROTO_DEFAULT="tcp"}
 : ${SB_MASTER_PORT_DEFAULT="6642"}
@@ -10,12 +10,12 @@
 : ${MANAGE_NORTHD_DEFAULT="no"}
 : ${INACTIVE_PROBE_DEFAULT="5000"}
 : ${LISTEN_ON_MASTER_IP_ONLY_DEFAULT="yes"}
-: ${NB_SSL_KEY_DEFAULT="/etc/openvswitch/ovnnb-privkey.pem"}
-: ${NB_SSL_CERT_DEFAULT="/etc/openvswitch/ovnnb-cert.pem"}
-: ${NB_SSL_CACERT_DEFAULT="/etc/openvswitch/cacert.pem"}
-: ${SB_SSL_KEY_DEFAULT="/etc/openvswitch/ovnsb-privkey.pem"}
-: ${SB_SSL_CERT_DEFAULT="/etc/openvswitch/ovnsb-cert.pem"}
-: ${SB_SSL_CACERT_DEFAULT="/etc/openvswitch/cacert.pem"}
+: ${NB_SSL_KEY_DEFAULT="/etc/ovn/ovnnb-privkey.pem"}
+: ${NB_SSL_CERT_DEFAULT="/etc/ovn/ovnnb-cert.pem"}
+: ${NB_SSL_CACERT_DEFAULT="/etc/ovn/cacert.pem"}
+: ${SB_SSL_KEY_DEFAULT="/etc/ovn/ovnsb-privkey.pem"}
+: ${SB_SSL_CERT_DEFAULT="/etc/ovn/ovnsb-cert.pem"}
+: ${SB_SSL_CACERT_DEFAULT="/etc/ovn/cacert.pem"}
 
 CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot"
 CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config --name 
OVN_REPL_INFO -s ovn_ovsdb_master_server"
-- 
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 00/19] OVN Interconnection

2019-10-28 Thread Numan Siddique
On Mon, Oct 21, 2019 at 6:23 AM Han Zhou  wrote:
>
> The series supports interconnecting multiple OVN deployments (e.g.  located at
> multiple data centers) through logical routers connected with tansit logical
> switches with overlay tunnels, managed through OVN control plane.  See the
> ovn-architecture.rst document updates for more details, and find the
> instructions in Documentation/tutorials/ovn-interconnection.rst.
>
> The first 6 patches are not directly related to the feature, but are fixes for
> existing problems but required by later patches.
>

For the first 5 patches in the series - Acked-by: Numan Siddique


I think these 5 patches can be merged so that the v2 of this series
will have lesser patches.

Regarding the patch 6, which updates the NEWS, does it makes sense to
delete the contents
of this file and  start fresh capturing only OVN related news ? Or
it's still better to preserve
the contents ?

I haven't yet looked into patch 7 onwards. I will come back to them soon.

Thanks
Numan


> Han Zhou (19):
>   ovn-northd.c: Fix datapath tunnel key allocation.
>   testsuite: Use ovn-macros instead of ofproto-macros.
>   sandbox: Remove the unused ovnnb.db from sandbox.
>   ovn-ctl: Change the order of ovn-lib and ovs-lib.
>   ovn-lib.in: Fix ovn-ctl status_xxx.
>   news: Fix the out-of-date information afte split from ovs.
>   ovn-architecture: Add documentation for OVN interconnection feature.
>   ovn-inb: Interconnection northbound DB schema and CLI.
>   ovn-isb: Interconnection southbound DB schema and CLI.
>   ovn-ic: Interconnection controller with AZ registeration.
>   ovn-northd.c: Refactor allocate_tnlid.
>   ovn-ic: Transit switch controller.
>   ovn-sb: Add columns is_interconn and is_remote to Chassis.
>   ovn-ic: Interconnection gateway controller.
>   ovn-ic: Interconnection port controller.
>   ovn.at: e2e test for OVN interconnection.
>   ovn-ctl: Refactor to reduce redundant code.
>   ovn-ctl: Support commands for interconnection.
>   tutorial: Add tutorial for OVN Interconnection.
>
>  .gitignore  |6 +
>  Documentation/automake.mk   |1 +
>  Documentation/tutorials/index.rst   |1 +
>  Documentation/tutorials/ovn-interconnection.rst |  181 
>  Makefile.am |1 +
>  NEWS|   46 +-
>  TODO.rst|   10 +
>  automake.mk |   75 ++
>  controller/binding.c|6 +-
>  controller/chassis.c|   14 +
>  debian/ovn-common.install   |2 +
>  debian/ovn-common.manpages  |4 +
>  ic/.gitignore   |2 +
>  ic/automake.mk  |   10 +
>  ic/ovn-ic.8.xml |  111 +++
>  ic/ovn-ic.c | 1042 
> +++
>  lib/.gitignore  |6 +
>  lib/automake.mk |   32 +-
>  lib/ovn-inb-idl.ann |9 +
>  lib/ovn-isb-idl.ann |9 +
>  lib/ovn-util.c  |   92 ++
>  lib/ovn-util.h  |   15 +
>  northd/ovn-northd.c |  110 +--
>  ovn-architecture.7.xml  |  107 ++-
>  ovn-inb.ovsschema   |   75 ++
>  ovn-inb.xml |  371 
>  ovn-isb.ovsschema   |  129 +++
>  ovn-isb.xml |  582 +
>  ovn-nb.ovsschema|5 +-
>  ovn-nb.xml  |   28 +-
>  ovn-sb.ovsschema|8 +-
>  ovn-sb.xml  |   24 +
>  tests/automake.mk   |8 +-
>  tests/ofproto-macros.at |  177 
>  tests/ovn-ic.at |  192 +
>  tests/ovn-inbctl.at |   65 ++
>  tests/ovn-isbctl.at |  112 +++
>  tests/ovn-macros.at |  161 +++-
>  tests/ovn.at|  149 
>  tests/testsuite.at  |4 +
>  tutorial/ovs-sandbox|   81 +-
>  utilities/.gitignore|4 +
>  utilities/automake.mk   |   16 +
>  utilities/ovn-ctl   |  425 -
>  utilities/ovn-ctl.8.xml |   91 ++
>  utilities/ovn-inbctl.8.xml  |  174 
>  utilities/ovn-inbctl.c  |  948 +
>  

Re: [ovs-dev] Travis build failures due to base image change to Xenial.

2019-10-28 Thread Ilya Maximets

On 09.08.2019 12:23, Ilya Maximets wrote:

On 01.08.2019 20:48, Aaron Conole wrote:

Ilya Maximets  writes:


Hi, everyone.

I'm trying to fix TravisCI build failures on older branches. Recently they
started to change default images from Trusty to Xenial. Not all the repositories
affected so far, but main openvswitch github repo already builds with Ubuntu
Xenial as a base image.

This caused few issues on older OVS branches due to more recent compilers:
   https://travis-ci.org/openvswitch/ovs/builds/565990648

There are patches that needs to be backported:

* Followiong patch needed for all branches from 2.9 to 2.5:
a7021b08b 2018-07-09 | configure: Disable -Wnull-pointer-arithmetic Clang 
warning. [Ben Pfaff]

* Branches 2.7 - 2.5 additionally needs:
1e78e3085 2017-01-26 | libX.pc: use the correct output directory [Aaron Conole]


Ack to that backport (if it matters).


OK. I went ahead and backported/pushed above patches to branches 2.9 - 2.6.
Travis CI should be green now on these branches.




Above makes branches 2.9 - 2.6 to build correctly.
I prepared backports for these branches and could push them.



However, there is an additional issue with branch-2.5:

branch-2.5 has kernel 2.6.32 in the .travis.yml, but gcc >= 5 is not
able to build this kernel. This kernel reached its EOL few years ago
already and will never be fixed. So, there are few options for this issue:

1. Simply remove 2.6.32 kernel from the build matrix.
(I have a simple patch for this.)


For branch-2.5 I'm waiting for more comments/Ack on the patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361422.html

Backports to this branch will go along with applying above patch.


Hi Ben,

This discussion happened while you were out of office, but I'd like to
hear your opinion on this issue with branch-2.5 that still exists.

Best regards, Ilya Maximets.





2. Officially drop support of this (or maybe some other too) kernel on 
branch-2.5.
Note: Starting from branch-2.6 OVS officially supports only kernels >= 3.10.
This will, probably, require documentation updates on all newer branches.

3. Drop support of branch-2.5, i.e. stop backporting patches and preparing
stable releases.
This might be not that easy as ovs-2.5 claimed as an LTS release right now.
Does anybody use it?


I don't have an opinion on what the right thing to do is.  However, we
don't have any users from Red Hat for OvS 2.5 that I'm aware.  We have
some users of OvS 2.6, but most have already migrated to 2.11 AFAIK.


4. Add 'dist: trusty' to .travis.yml on branch-2.5 so Travis will keep using
Ubuntu Trusty image with old compilers that able to build kernel 2.6.32.

Would like to hear some opinions.

Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovs v3 0/2] Introduce dpdkvdpa netdev

2019-10-28 Thread Ilya Maximets

On 17.10.2019 13:16, Noa Ezra wrote:

There are two approaches to communicate with a guest, using virtIO or
SR-IOV.
SR-IOV allows working with port representor which is attached to the
OVS and a matching VF is given with pass-through to the VM.
HW rules can process packets from up-link and direct them to the VF
without going through SW (OVS) and therefore SR-IOV gives the best
performance.
However, SR-IOV architecture requires that the guest will use a driver
which is specific to the underlying HW. Specific HW driver has two main
drawbacks:
1. Breaks virtualization in some sense (VM aware of the HW), can also
limit the type of images supported.
2. Less natural support for live migration.

Using virtIO interface solves both problems, but reduces performance and
causes losing of some functionality, for example, for some HW offload,
working directly with virtIO cannot be supported.
In order to solve this conflict, we created a new netdev type-dpdkvdpa.
The new netdev is basically similar to a regular dpdk netdev, but it
has some additional functionality for transferring packets from virtIO
guest (VM) to a VF and vice versa. With this solution we can benefit
both SR-IOV and virtIO.
vDPA netdev is designed to support both SW and HW use-cases.
HW mode will be used to configure vDPA capable devices. The support
for this mode is on progress in the dpdk community.
SW acceleration is used to leverage SR-IOV offloads to virtIO guests
by relaying packets between VF and virtio devices and as a pre-step for
supporting vDPA in HW mode.

Running example:
1. Configure OVS bridge and ports:
ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options: \
 dpdk-devargs=
ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
 options:vdpa-socket-path= \
 options:vdpa-accelerator-devargs= \
 options:dpdk-devargs=,representor=[id]
2. Run a virtIO guest (VM) in server mode that creates the socket of
the vDPA port.
3. Send traffic.

Noa Ezra (2):
   netdev-dpdk-vdpa: Introduce dpdkvdpa netdev
   netdev-dpdk: Add dpdkvdpa port

  Documentation/automake.mk   |   1 +
  Documentation/topics/dpdk/index.rst |   1 +
  Documentation/topics/dpdk/vdpa.rst  |  90 +
  NEWS|   1 +
  lib/automake.mk |   4 +-
  lib/netdev-dpdk-vdpa.c  | 750 
  lib/netdev-dpdk-vdpa.h  |  54 +++
  lib/netdev-dpdk.c   | 162 
  vswitchd/vswitch.xml|  25 ++
  9 files changed, 1087 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/topics/dpdk/vdpa.rst
  create mode 100755 lib/netdev-dpdk-vdpa.c
  create mode 100644 lib/netdev-dpdk-vdpa.h



Hi, everyone.

So, I have a few questions (mostly to Roni?):

1. What happened to idea of implementing this as a DPDK vdev?

2. What was the results of "direct output optimization" patch [1] testing?
   At this point I also have to mention that OVS is going to have TSO support
   in one of the next releases (at least we have some progress on that, thanks
   to idea of using extbuf).  This way in combine with patch [1] there should
   be no any benefits from having separate netdev for forwarding purposes.
   Even without this patch, havin TSO alone should provide good performance
   benefits making it considerable to not have any extra netdev.

   [1] https://patchwork.ozlabs.org/patch/1105880/


3. Regarding implementation, I expected less code duplication. It's unclear
   why you're not reusing hotplug capabilities of exiting dpdk netdev.
   I expected that this new netdev will call netdev_open() like a 3 times
   with a few tweaks for TSO enabling or stripping/adding some options.

4. This code seem doesn't make sense in sw mode without full HW offloading.
   i.e. it doesn't make sense to use this netdev while VF representor is
   attached to userspace datapath.  This is just because, according to the
   solution architecture, all the traffic will be copied firstly from the
   VM to VF (parsed on the fly to enable TSO), and will immediately appear
   on VF representor to be handled by OVS (same parsing with partial offload
   + actions execution).  I'm completely unsure if it any faster than just
   sending packet from VM to VF withoout bypassing OVS processing.  Maybe
   even slower and like 3 times heavier for PCI bandwidth.

5. "HW mode will be used to configure vDPA capable devices. The support
for this mode is on progress in the dpdk community."
   AFAIU, host side of vDPA support is done already for some time in DPDK.
   API is prepared and there is a vhost-vdpa example that allows you to
   check the functionality.  As I understand, the missing part in dpdk right
   now is a DPDK virtio driver for guest, but this is not a limitation for
   not implementing vDPA support from the host side.  Am I missing something?


Re: [ovs-dev] [PATCH v2 2/2 ovn] OVN: Use ipv4.src and ipv4.dst actions for NAT rules

2019-10-28 Thread Numan Siddique
On Sat, Oct 5, 2019 at 1:45 AM Ankur Sharma  wrote:
>
> For dnat_and_snat rules which are meant to be stateless
> instead of using ct_snat/dnat OVN actions, we will use
> ipv4.src/ipv4.dst.
>
> This actions will do 1:1 mapping to inner ip to external ip,
> while recalculating the checksums.
>
> Signed-off-by: Ankur Sharma 


Hi Ankur,

There is a checkpatch warning - WARNING: Line is 80 characters long
(recommended limit is 79).
Please address it.

Please see below for few comments.

Thanks
Numan

> ---
>  northd/ovn-northd.8.xml | 34 +++
>  northd/ovn-northd.c | 86 
> +++--
>  tests/ovn-northd.at | 50 
>  3 files changed, 154 insertions(+), 16 deletions(-)
>
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 0f4f1c1..7d5d102 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -1718,7 +1718,9 @@ icmp6 {
>to change the source IP address of a packet from A to
>B, a priority-90 flow matches ip 
>ip4.dst == B with an action
> -  ct_snat; .
> +  ct_snat; . If the NAT rule is of type dnat_and_snat
> +  and has is_stateless=true in the options, then the 
> action
> +  would be replace_dst_ip(B).

I think you need to update the documentation as you removed the action
- replace_dst_ip in v2.


>  
>
>  
> @@ -1738,7 +1740,10 @@ icmp6 {
>B, a priority-100 flow matches ip 
>ip4.dst == B  inport == GW,
>where GW is the logical router gateway port, with an
> -  action ct_snat;.
> +  action ct_snat;. If the NAT rule is of type
> +  dnat_and_snat and has is_stateless=true in the
> +  options, then the action would be replace_dst_ip

Same here and other places too.

> +  (B).
>  
>
>  
> @@ -1858,7 +1863,10 @@ icmp6 {
>  Gateway router is configured to force SNAT any DNATed packet,
>  the above action will be replaced by
>  flags.force_snat_for_dnat = 1; flags.loopback = 1;
> -ct_dnat(B);.
> +ct_dnat(B);. If the NAT rule is of type
> +dnat_and_snat and has is_stateless=true in the
> +options, then the action would be replace_dst_ip
> +(B).
>
>
>
> @@ -1890,7 +1898,10 @@ icmp6 {
>B, a priority-100 flow matches ip 
>ip4.dst == B  inport == GW,
>where GW is the logical router gateway port, with an
> -  action ct_dnat(B);.
> +  action ct_dnat(B);. If the NAT rule is of
> +  type dnat_and_snat and has is_stateless=true in the
> +  options, then the action would be replace_dst_ip
> +  (B).
>  
>
>  
> @@ -2553,7 +2564,10 @@ nd_ns {
>matches ip  ip4.src == B
> outport == GW, where GW
>is the logical router gateway port, with an action
> -  ct_dnat;.
> +  ct_dnat;. If the NAT rule is of type
> +  dnat_and_snat and has is_stateless=true in the
> +  options, then the action would be replace_src_ip
> +  (B).
>  
>
>  
> @@ -2611,7 +2625,10 @@ nd_ns {
>ip  ip4.src == A with an action
>ct_snat(B);.  The priority of the flow
>is calculated based on the mask of A, with matches
> -  having larger masks getting higher priorities.
> +  having larger masks getting higher priorities. If the NAT rule is
> +  of type dnat_and_snat and has is_stateless=true in the
> +  options, then the action would be replace_src_ip
> +  (B).
>  
>  
>A priority-0 logical flow with match 1 has actions
> @@ -2634,7 +2651,10 @@ nd_ns {
>logical router gateway port, with an action
>ct_snat(B);.  The priority of the flow
>is calculated based on the mask of A, with matches
> -  having larger masks getting higher priorities.
> +  having larger masks getting higher priorities. If the NAT rule
> +  is of type dnat_and_snat and has is_stateless=true
> +  in the options, then the action would be replace_src_ip
> +  (B).
>  
>
>  
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index f393ceb..4036392 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -6314,6 +6314,18 @@ copy_ra_to_sb(struct ovn_port *op, const char 
> *address_mode)
>  smap_destroy();
>  }
>
> +static inline bool
> +lrouter_nat_is_stateless(const struct nbrec_nat *nat)
> +{
> +const char *is_stateless = smap_get(>options, "is_stateless");
> +
> +if (is_stateless && !strcmp(is_stateless, "true")) {
> +return true;
> +}
> +
> +return false;
> +}
> +
>  static void
>  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>  struct hmap *lflows, struct shash 

Re: [ovs-dev] [PATCH v2 1/2 ovn] OVN: ADD nbctl cli to mark a dnat_and_snat rule as stateless

2019-10-28 Thread Numan Siddique
On Thu, Oct 17, 2019 at 10:50 PM Ankur Sharma 
wrote:

> Hi,
>
> Gentle reminder for reviewing this series.
>
> Regards,
> Ankur
> 
> From: Ankur Sharma 
> Sent: Friday, October 4, 2019 1:13 PM
> To: ovs-dev@openvswitch.org 
> Cc: Ankur Sharma 
> Subject: [PATCH v2 1/2 ovn] OVN: ADD nbctl cli to mark a dnat_and_snat
> rule as stateless
>
> Adding ovn-nbctl to mark a dnat_and_snat rule as stateless.
> This configuration will added to "options" column of NAT table.
>
> Signed-off-by: Ankur Sharma 
> ---
>  ovn-nb.ovsschema  |  6 --
>  ovn-nb.xml|  5 +
>  tests/ovn-nbctl.at| 29 +
>  utilities/ovn-nbctl.8.xml | 12 +++-
>  utilities/ovn-nbctl.c | 30 +-
>  5 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema
> index 2c87cbb..084305b 100644
> --- a/ovn-nb.ovsschema
> +++ b/ovn-nb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>  "name": "OVN_Northbound",
> -"version": "5.16.0",
> -"cksum": "923459061 23095",
> +"version": "5.17.0",
> +"cksum": "1128988054 23237",
>  "tables": {
>  "NB_Global": {
>  "columns": {
> @@ -345,6 +345,8 @@
>   "snat",
>
> "dnat_and_snat"
> ]]}}},
> +"options": {"type": {"key": "string", "value": "string",
> + "min": 0, "max": "unlimited"}},
>  "external_ids": {
>  "type": {"key": "string", "value": "string",
>   "min": 0, "max": "unlimited"}}},
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index b41b579..a1ebe05 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -2254,6 +2254,11 @@
>
>  
>
> +
> +  Indicates if a dnat_and_snat rule should lead to connection
> +  tracking state or not.
> +
> +
>

Hi Ankur,

I would suggest to use the key as "stateless" instead of "is_stateless".

Thanks
Numan

 
>
>  See External IDs at the beginning of this document.
> diff --git a/tests/ovn-nbctl.at b/tests/ovn-nbctl.at
> index 01091dd..4ebc1bb 100644
> --- a/tests/ovn-nbctl.at
> +++ b/tests/ovn-nbctl.at
> @@ -516,6 +516,31 @@ dnat_and_snat30.0.0.2   192.168.1.3
>  snat 30.0.0.1   192.168.1.0/24
>  ])
>
> +AT_CHECK([ovn-nbctl --bare --columns=options list nat | grep
> is_stateless=true| wc -l], [0],
> +[0
> +])
> +AT_CHECK([ovn-nbctl --stateless lr-nat-add lr0 dnat_and_snat 40.0.0.2
> 192.168.1.4])
> +AT_CHECK([ovn-nbctl --bare --columns=options list nat | grep
> is_stateless=true| wc -l], [0],
> +[1
> +])
> +AT_CHECK([ovn-nbctl --stateless lr-nat-add lr0 dnat 40.0.0.2
> 192.168.1.4], [1], [],
> +[ovn-nbctl: is_stateless is not applicable to dnat or snat types
> +])
> +AT_CHECK([ovn-nbctl --stateless lr-nat-add lr0 snat 40.0.0.2
> 192.168.1.4], [1], [],
> +[ovn-nbctl: is_stateless is not applicable to dnat or snat types
> +])
> +AT_CHECK([ovn-nbctl lr-nat-add lr0 snat 40.0.0.2 192.168.1.5], [1], [],
> +[ovn-nbctl: 40.0.0.2, 192.168.1.5: External ip cannot be shared across
> stateless and stateful NATs
> +])
> +AT_CHECK([ovn-nbctl lr-nat-add lr0 dnat 40.0.0.2 192.168.1.5], [1], [],
> +[ovn-nbctl: 40.0.0.2, 192.168.1.5: External ip cannot be shared across
> stateless and stateful NATs
> +])
>
+
> +AT_CHECK([ovn-nbctl lr-nat-add lr0 snat 40.0.0.3 192.168.1.6])
> +AT_CHECK([ovn-nbctl --stateless lr-nat-add lr0 dnat_and_snat 40.0.0.3
> 192.168.1.7], [1], [],
> +[ovn-nbctl: 40.0.0.3, 192.168.1.7: External ip cannot be shared across
> stateless and stateful NATs
> +])
> +
>  dnl Deletes the NATs
>  AT_CHECK([ovn-nbctl lr-nat-del lr0 dnat_and_snat 30.0.0.3], [1], [],
>  [ovn-nbctl: no matching NAT with the type (dnat_and_snat) and external_ip
> (30.0.0.3)
> @@ -533,14 +558,18 @@ AT_CHECK([ovn-nbctl lr-nat-list lr0], [0], [dnl
>  TYPE EXTERNAL_IPLOGICAL_IPEXTERNAL_MAC
>  LOGICAL_PORT
>  dnat 30.0.0.1   192.168.1.2
>  dnat_and_snat30.0.0.2   192.168.1.3
> +dnat_and_snat40.0.0.2   192.168.1.4
>  snat 30.0.0.1   192.168.1.0/24
> +snat 40.0.0.3   192.168.1.6
>  ])
>
>  AT_CHECK([ovn-nbctl lr-nat-del lr0 dnat])
>  AT_CHECK([ovn-nbctl lr-nat-list lr0], [0], [dnl
>  TYPE EXTERNAL_IPLOGICAL_IPEXTERNAL_MAC
>  LOGICAL_PORT
>  dnat_and_snat30.0.0.2   192.168.1.3
> +dnat_and_snat40.0.0.2   192.168.1.4
>  snat 30.0.0.1   192.168.1.0/24
> +snat 40.0.0.3   192.168.1.6
>  ])
>
>  AT_CHECK([ovn-nbctl lr-nat-del lr0])
> diff --git a/utilities/ovn-nbctl.8.xml b/utilities/ovn-nbctl.8.xml
> index fd75c0e..2161d8c 100644
> --- a/utilities/ovn-nbctl.8.xml
> +++ b/utilities/ovn-nbctl.8.xml
> @@ 

Re: [ovs-dev] [PATCH V2] Add offload packets statistics

2019-10-28 Thread zhaozhanxu
Thanks for your reply. I already modified some of them, and I think the others 
need to discuss.
V2===>V3:
1. Make the n_offload_packets to be a subset of n_packets, and n_offload_bytes 
to be a subset of n_bytes.
2. Add a new structure 'dpif_flow_detailed_stats' to store the offload 
statistics, so all the functions you mentioned don't need to modify.


I don't think that function 'parse_tc_flower_to_match' should display new 
fields, all the commands called function 'parse_tc_flower_to_match' are display 
the datapath flows.
The datapath flow would be either non-offloaded or offloaded, so that the 
statistics are offloaded or non-offloaded depends on datapath flows.


The patch link: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/363942.html


At 2019-10-25 08:53:35, "Ben Pfaff"  wrote:
>On Tue, Oct 15, 2019 at 02:56:23PM +0800, zhaozhanxu wrote:
>> Add argument '-m' for command ovs-appctl bridge/dump-flows
>> to display the offloaded packets statistics.
>
>Thanks for the updated patch.
>
>Are n_offload_packets a subset of n_packets, or in addition to them?  If
>they are a subset, then n_offload_packets should always be less than or
>equal to n_packets; if they are in addition, then there is no
>relationship between the two.  Either way, this should be explained in
>comments on struct dpif_flow_stats.
>
>I guess that "subset" makes more sense--after all, offloaded packets are
>still packets--but then we need to update a few places, like
>rule_dpif_credit_stats__().  If we take the "additional" choice, then we
>need to change other places: I think most places that n_packets is
>referenced, we need to write n_packets + n_offload_packets instead.
>
>Please also update the documentation for bridge/dump-flows in
>ovs-vswitchd(8).
>
>get_dpif_flow_stats(), in lib/dpif-netdev.c, should initialize '*stats'.
>I don't think it initializes the new members, but it should.
>
>dpif_netdev_flow_put() adds together dpif_flow_stats.  Currently I think
>those dpif_flow_stats always have zero in n_offload_*, but it might be a
>good idea to add them together anyway.
>
>Actually dpif_netdev_flow_del() does the same thing.  Probably that
>means it would be a good idea to factor out the code that adds these
>things together into a new helper function since there's already
>duplicate code and this adds more of it.
>
>dpif_netlink_flow_get_stats() needs to initialize the new fields.
>
>dpif_flow_stats_extract() needs to initialize the new fields.
>
>I imagine that dpif_flow_stats_format() should display the new
>fields--perhaps only if they are nonzero?
>
>Should parse_tc_flower_to_match() set n_offload_* instead of packets and
>bytes?  (Or both of them to the same values, if n_offload_* is a subset
>of the regular values.)
>
>Ditto for netdev_tc_flow_del().
>
>upcall_xlate() should initialize n_offload_*.
>
>Ditto for revalidate_ukey() and push_dp_ops().
>
>Thanks,
>
>Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [dpdk-latest PATCH] netdev-dpdk: add custom vhost statistics to count IRQs

2019-10-28 Thread Ilya Maximets

On 28.10.2019 11:14, Eelco Chaudron wrote:



On 25 Oct 2019, at 17:06, Ilya Maximets wrote:


On 25.10.2019 15:51, Eelco Chaudron wrote:





 +static
+void vhost_guest_notified(int vid)
+{
+    struct netdev_dpdk *dev;
+
+    ovs_mutex_lock(_mutex);
+    LIST_FOR_EACH (dev, list_node, _list) {
+    if (netdev_dpdk_get_vid(dev) == vid) {
+    ovs_mutex_lock(>mutex);
+    rte_spinlock_lock(>stats_lock);
+    dev->vhost_irqs++;
+    rte_spinlock_unlock(>stats_lock);
+    ovs_mutex_unlock(>mutex);
+    break;
+    }
+    }
+    ovs_mutex_unlock(_mutex);


So, for every single eventfd_write() we're taking the global mutex,
traversing list of all the devices, taking one more mutex and finally
taking the spinlock just to increment a single counter.
I think, it's too much.


Yes you are right this might be a bit of overkill…


Wouldn't it significantly affect interrupt based virtio drivers in
terms of performance?  How frequently 2 vhost-user ports will lock
each other on the global device mutex?  How frequently 2 PMD threads
will lock each other on rx/tx to different vrings of the same vhost
port?


I used iperf3 and did not show any negative effects (maybe I should add more 
than 4 physical queues, 1 had one VM queue), but also the results show large 
deviations.


From my experience, I'd say that iperf via kernel virtio driver is not able
to saturate a PMD thread.  They are likely relaxed and have time to wait
on the mutex.  Also, You, probably, have only couple of ports, so the critical
section is not large enough to produce frequent interlocking.

I'm just pointing that to count number of eventfd syscalls we're probably
making 2 other syscalls and probably sleeping in them.


Yes I agree and send out a v2 using a simple coverage counter


For testing purposes, I'd suggest to add 10-20(100?) dummy pmd ports (e.g.
vhost without VMs) and assign them to different thread. Probably, more threads
in iperf for traffic generation, switching to small udp packets.
You could also replace lock() with trylock() for dpdk_mutex and count
contentions in a same way as it done for tx_lock in David's patch.


The try lock would have been a good thing to see if I hit this lock contention.
Was already testing with small UDP packets, but used iPerf as I needed a kernel 
like tool not a poll mode one based on DPDK.




I see that 'guest_notified' patch is already in dpdk master, but
wouldn't it be better to accumulate this statistics inside DPDK
and return it with some new API like rte_vhost_get_vring_xstats()
if required?  I don't see any usecase for this callback beside
counting some stats.


I agree, however, the vhost library has no internal counters (only the PMD 
implementation which we do not use), so hence they liked the callback.


One more possible issue is that this not-so-useful callback hijacked
the reserved space in the structure and we could suffer in the
future while adding new callbacks due to ABI breakage.
(In context of DPDK API/ABI stability discussions)




For the current patch I could only suggest to convert it to simple
coverage counter like it done in 'vhost_tx_contention' patch here:
https://patchwork.ozlabs.org/patch/1175849/


This is what I’ll do (and will send a patch soon). Although from a user 
perspective a per vhost user would make more sense as it could easily indicate 
which VM is configured wrongly…


Sure.  I think that we need to re-imagine concept and implementation of
pmd-perf-stats in a "coverage"-like way, as I suggested previously while
discussing initial patches for pmd-perf, to count stats like vhost qlen,
tx contentions and this one.  At least, this should be not so hard to do
on a per-PMD basis.  per-netdev might be more tricky, but I believe that
it is possible.


This might be something that should be added. For per-netdev we could use 
atomics assuming the counters are for exception use cases (or per CPU) and get 
ride of any locking.


I have lockless netdev stats in my ToDo list for a few months already since
all these patches about packet drops started to appear.

However, for vhost we could also need some lockless (e.g. rcu protected)
structure to obtain netdev by the vid.

Key point for stats like tx_contentions or random syscall counters is that
I don't want them to be part of a regular (even custom) network statistics,
because they are not network statistics.  So, this needs to be accounted
in a different way, e.g. in pmd-perf-stats or some similar netdev-perf-stats
implemented on the same re-worked pmd-perf infrastructure.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] Limit the number of generated conjunctions.

2019-10-28 Thread Mark Michelson

On 10/28/19 7:46 AM, Dumitru Ceara wrote:

On Fri, Oct 25, 2019 at 11:07 PM Mark Michelson  wrote:


There is a maximum number of resubmits that can occur during packet
processing. Deliberately creating a flow table that might cross this
threshold is irresponsible.

This commit causes the ofctrl code to track the maximum width and depth
of conjunctions in the desired flow table. By multiplying them, we have
a possible worst case for number of resubmits required. If this number
exceeds a quarter of the maximum resubmits allowed, then we fall back to
forcing crossproducts.

The resulting flow table can end up being a mixture of conjunction and
non-conjunction flows if the limit is exceeded.

Signed-off-by: Mark Michelson 


Hi Mark,

I've been testing the series on a setup with:
- 100 logical routers connected to a logical gateway router
- 100 logical switches (each connected to a logical router)
- 200 VIFs (2 per logical switch)
- 2 NAT rules on each router

I bound all VIFs to OVS internal interfaces on the machine when
ovn-northd is running.

If I start ovn-controller on the same node, without your series I see
ovn-controller taking more than 100s for a single flow computation
run.

If I apply your series, the maximum duration of a flow computation run
in the same scenario drops to ~4s.

This is really nice, however, I see some issues with flow removal (see below).


Thanks for the test, Dumitru.




---
  controller/lflow.c  | 11 +
  controller/ofctrl.c | 69 +
  controller/ofctrl.h |  6 +
  include/ovn/expr.h  |  2 ++
  lib/expr.c  |  6 +
  5 files changed, 94 insertions(+)

diff --git a/controller/lflow.c b/controller/lflow.c
index 34b7c36a6..318066465 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -37,6 +37,13 @@
  VLOG_DEFINE_THIS_MODULE(lflow);

  COVERAGE_DEFINE(lflow_run);
+
+/* Limit maximum conjunctions to a quarter of the max
+ * resubmits. Unfortunatley, max resubmits is not publicly
+ * defined in a header file, so if max resubmits is changed
+ * from 4096, we have to make sure to update this as well
+ */
+#define MAX_CONJUNCTIONS (4096 / 4)

  /* Symbol table. */

@@ -602,6 +609,10 @@ consider_logical_flow(
  struct expr *prereqs;
  char *error;

+uint32_t conj_worst_case;
+conj_worst_case = flow_table->max_conj_width * flow_table->max_conj_depth;
+expr_set_force_crossproduct(conj_worst_case >= MAX_CONJUNCTIONS);
+
  error = ovnacts_parse_string(lflow->actions, , , );
  if (error) {
  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index afb0036f1..2b2fde3c9 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -686,6 +686,64 @@ ofctrl_check_and_add_flow(struct ovn_desired_flow_table 
*flow_table,
f->uuid_hindex_node.hash);
  }

+static void
+get_conjunction_dimensions(const struct ovn_flow *f, uint32_t *conj_width_p,
+   uint32_t *conj_depth_p)
+{
+struct ofpact *ofpact;
+uint32_t conj_width = 0;
+uint32_t conj_depth = 0;
+OFPACT_FOR_EACH (ofpact, f->ofpacts, f->ofpacts_len) {
+if (ofpact->type == OFPACT_CONJUNCTION) {
+struct ofpact_conjunction *conj = ofpact_get_CONJUNCTION(ofpact);
+if (conj->n_clauses > conj_depth) {
+conj_depth = conj->n_clauses;
+}
+conj_width++;
+}
+}
+
+*conj_width_p = conj_width;
+*conj_depth_p = conj_depth;
+}
+
+static void
+adjust_conjunction_count(struct ovn_desired_flow_table *desired_flows,
+ const struct ovn_flow *f, bool add)
+{
+uint32_t conj_width;
+uint32_t conj_depth;
+
+get_conjunction_dimensions(f, _width, _depth);
+
+if (add) {
+if (conj_width > desired_flows->max_conj_width) {
+desired_flows->max_conj_width = conj_width;
+}
+if (conj_depth > desired_flows->max_conj_depth) {
+desired_flows->max_conj_depth = conj_depth;
+}
+} else if (desired_flows->max_conj_width <= conj_width ||
+   desired_flows->max_conj_depth <= conj_depth) {
+/* Removing either the widest or deepest conjunction.
+ * Walk the table and recalculate maximums
+ */
+struct ovn_flow *iter;
+desired_flows->max_conj_width = 0;
+desired_flows->max_conj_depth = 0;
+HMAP_FOR_EACH (iter, match_hmap_node,
+   _flows->match_flow_table) {
+get_conjunction_dimensions(iter, _width, _depth);
+if (conj_width > desired_flows->max_conj_width) {
+desired_flows->max_conj_width = conj_width;
+}
+if (conj_depth > desired_flows->max_conj_depth) {
+desired_flows->max_conj_depth = conj_depth;
+}
+}


I think this is quite expensive because we change single flow removal
complexity 

[ovs-dev] [PATCH V3] Add offload packets statistics

2019-10-28 Thread zhaozhanxu
Add argument '-m' or '--more' for command ovs-appctl bridge/dump-flows
to display the offloaded packets statistics.

The commands display as below:

orignal command:

ovs-appctl bridge/dump-flows br0

duration=574s, n_packets=1152, n_bytes=110768, priority=0,actions=NORMAL
table_id=254, duration=574s, n_packets=0, n_bytes=0, 
priority=2,recirc_id=0,actions=drop
table_id=254, duration=574s, n_packets=0, n_bytes=0, 
priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=574s, n_packets=0, n_bytes=0, 
priority=0,reg0=0x2,actions=drop
table_id=254, duration=574s, n_packets=0, n_bytes=0, 
priority=0,reg0=0x3,actions=drop

new command with argument '-m' or '--more'

Notice: 'n_offload_packets' are a subset of n_packets and 'n_offload_bytes' are
a subset of n_bytes.

ovs-appctl bridge/dump-flows -m br0

duration=576s, n_packets=1152, n_bytes=110768, n_offload_packets=1107, 
n_offload_bytes=107992, priority=0,actions=NORMAL
table_id=254, duration=576s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=2,recirc_id=0,actions=drop
table_id=254, duration=576s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=576s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=0,reg0=0x2,actions=drop
table_id=254, duration=576s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=0,reg0=0x3,actions=drop

ovs-appctl bridge/dump-flows --more br0

duration=582s, n_packets=1152, n_bytes=110768, n_offload_packets=1107, 
n_offload_bytes=107992, priority=0,actions=NORMAL
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=2,recirc_id=0,actions=drop
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=0,reg0=0x2,actions=drop
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, 
n_offload_bytes=0, priority=0,reg0=0x3,actions=drop

Signed-off-by: zhaozhanxu 
---
 NEWS   |  2 ++
 lib/dpif.h | 10 ++
 ofproto/bond.c |  7 ++--
 ofproto/ofproto-dpif-upcall.c  | 11 +++---
 ofproto/ofproto-dpif-xlate-cache.c |  8 ++---
 ofproto/ofproto-dpif-xlate-cache.h |  6 ++--
 ofproto/ofproto-dpif-xlate.c   |  6 ++--
 ofproto/ofproto-dpif.c | 29 ++--
 ofproto/ofproto-dpif.h |  4 +--
 ofproto/ofproto-provider.h | 11 --
 ofproto/ofproto.c  | 55 ++
 ofproto/ofproto.h  |  2 +-
 vswitchd/bridge.c  | 15 +---
 vswitchd/ovs-vswitchd.8.in |  3 +-
 14 files changed, 108 insertions(+), 61 deletions(-)

diff --git a/NEWS b/NEWS
index 330ab3832..14023fcf7 100644
--- a/NEWS
+++ b/NEWS
@@ -81,6 +81,8 @@ v2.12.0 - 03 Sep 2019
  * Add support for conntrack zone-based timeout policy.
- 'ovs-dpctl dump-flows' is no longer suitable for dumping offloaded flows.
  'ovs-appctl dpctl/dump-flows' should be used instead.
+   - Add new argument '-m | --more' for command 'ovs-appctl bridge/dump-flows',
+ so it can display offloaded packets statistics.
- Add L2 GRE tunnel over IPv6 support.
 
 v2.11.0 - 19 Feb 2019
diff --git a/lib/dpif.h b/lib/dpif.h
index 289d574a0..7d82a28be 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -496,6 +496,16 @@ struct dpif_flow_stats {
 uint16_t tcp_flags;
 };
 
+/* more statistics info for offloaded packets and bytes */
+struct dpif_flow_detailed_stats {
+uint64_t n_packets; /* n_offload_packets are a subset of n_packets */
+uint64_t n_bytes;   /* n_offload_bytes are a subset of n_bytes */
+uint64_t n_offload_packets;
+uint64_t n_offload_bytes;
+long long int used;
+uint16_t tcp_flags;
+};
+
 struct dpif_flow_attrs {
 bool offloaded; /* True if flow is offloaded to HW. */
 const char *dp_layer;   /* DP layer the flow is handled in. */
diff --git a/ofproto/bond.c b/ofproto/bond.c
index c5d5f2c03..03de1eec9 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -946,13 +946,12 @@ bond_recirculation_account(struct bond *bond)
 struct rule *rule = entry->pr_rule;
 
 if (rule) {
-uint64_t n_packets OVS_UNUSED;
+struct pkts_stats stats;
 long long int used OVS_UNUSED;
-uint64_t n_bytes;
 
 rule->ofproto->ofproto_class->rule_get_stats(
-rule, _packets, _bytes, );
-bond_entry_account(entry, n_bytes);
+rule, , );
+bond_entry_account(entry, stats.n_bytes);
 }
 }
 }
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 657aa7f79..5fdba5b65 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ 

Re: [ovs-dev] [PATCH 3/3] Limit the number of generated conjunctions.

2019-10-28 Thread Dumitru Ceara
On Fri, Oct 25, 2019 at 11:07 PM Mark Michelson  wrote:
>
> There is a maximum number of resubmits that can occur during packet
> processing. Deliberately creating a flow table that might cross this
> threshold is irresponsible.
>
> This commit causes the ofctrl code to track the maximum width and depth
> of conjunctions in the desired flow table. By multiplying them, we have
> a possible worst case for number of resubmits required. If this number
> exceeds a quarter of the maximum resubmits allowed, then we fall back to
> forcing crossproducts.
>
> The resulting flow table can end up being a mixture of conjunction and
> non-conjunction flows if the limit is exceeded.
>
> Signed-off-by: Mark Michelson 

Hi Mark,

I've been testing the series on a setup with:
- 100 logical routers connected to a logical gateway router
- 100 logical switches (each connected to a logical router)
- 200 VIFs (2 per logical switch)
- 2 NAT rules on each router

I bound all VIFs to OVS internal interfaces on the machine when
ovn-northd is running.

If I start ovn-controller on the same node, without your series I see
ovn-controller taking more than 100s for a single flow computation
run.

If I apply your series, the maximum duration of a flow computation run
in the same scenario drops to ~4s.

This is really nice, however, I see some issues with flow removal (see below).

> ---
>  controller/lflow.c  | 11 +
>  controller/ofctrl.c | 69 
> +
>  controller/ofctrl.h |  6 +
>  include/ovn/expr.h  |  2 ++
>  lib/expr.c  |  6 +
>  5 files changed, 94 insertions(+)
>
> diff --git a/controller/lflow.c b/controller/lflow.c
> index 34b7c36a6..318066465 100644
> --- a/controller/lflow.c
> +++ b/controller/lflow.c
> @@ -37,6 +37,13 @@
>  VLOG_DEFINE_THIS_MODULE(lflow);
>
>  COVERAGE_DEFINE(lflow_run);
> +
> +/* Limit maximum conjunctions to a quarter of the max
> + * resubmits. Unfortunatley, max resubmits is not publicly
> + * defined in a header file, so if max resubmits is changed
> + * from 4096, we have to make sure to update this as well
> + */
> +#define MAX_CONJUNCTIONS (4096 / 4)
>
>  /* Symbol table. */
>
> @@ -602,6 +609,10 @@ consider_logical_flow(
>  struct expr *prereqs;
>  char *error;
>
> +uint32_t conj_worst_case;
> +conj_worst_case = flow_table->max_conj_width * 
> flow_table->max_conj_depth;
> +expr_set_force_crossproduct(conj_worst_case >= MAX_CONJUNCTIONS);
> +
>  error = ovnacts_parse_string(lflow->actions, , , );
>  if (error) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index afb0036f1..2b2fde3c9 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -686,6 +686,64 @@ ofctrl_check_and_add_flow(struct ovn_desired_flow_table 
> *flow_table,
>f->uuid_hindex_node.hash);
>  }
>
> +static void
> +get_conjunction_dimensions(const struct ovn_flow *f, uint32_t *conj_width_p,
> +   uint32_t *conj_depth_p)
> +{
> +struct ofpact *ofpact;
> +uint32_t conj_width = 0;
> +uint32_t conj_depth = 0;
> +OFPACT_FOR_EACH (ofpact, f->ofpacts, f->ofpacts_len) {
> +if (ofpact->type == OFPACT_CONJUNCTION) {
> +struct ofpact_conjunction *conj = ofpact_get_CONJUNCTION(ofpact);
> +if (conj->n_clauses > conj_depth) {
> +conj_depth = conj->n_clauses;
> +}
> +conj_width++;
> +}
> +}
> +
> +*conj_width_p = conj_width;
> +*conj_depth_p = conj_depth;
> +}
> +
> +static void
> +adjust_conjunction_count(struct ovn_desired_flow_table *desired_flows,
> + const struct ovn_flow *f, bool add)
> +{
> +uint32_t conj_width;
> +uint32_t conj_depth;
> +
> +get_conjunction_dimensions(f, _width, _depth);
> +
> +if (add) {
> +if (conj_width > desired_flows->max_conj_width) {
> +desired_flows->max_conj_width = conj_width;
> +}
> +if (conj_depth > desired_flows->max_conj_depth) {
> +desired_flows->max_conj_depth = conj_depth;
> +}
> +} else if (desired_flows->max_conj_width <= conj_width ||
> +   desired_flows->max_conj_depth <= conj_depth) {
> +/* Removing either the widest or deepest conjunction.
> + * Walk the table and recalculate maximums
> + */
> +struct ovn_flow *iter;
> +desired_flows->max_conj_width = 0;
> +desired_flows->max_conj_depth = 0;
> +HMAP_FOR_EACH (iter, match_hmap_node,
> +   _flows->match_flow_table) {
> +get_conjunction_dimensions(iter, _width, _depth);
> +if (conj_width > desired_flows->max_conj_width) {
> +desired_flows->max_conj_width = conj_width;
> +}
> +if (conj_depth > desired_flows->max_conj_depth) {
> +

Re: [ovs-dev] [PATCHv6] netdev-afxdp: Add need_wakeup supprt.

2019-10-28 Thread Eelco Chaudron



On 23 Oct 2019, at 23:06, William Tu wrote:


The patch adds support for using need_wakeup flag in AF_XDP rings.
A new option, use_need_wakeup, is added.  When this option is used,
it means that OVS has to explicitly wake up the kernel RX, using 
poll()

syscall and wake up TX, using sendto() syscall. This feature improves
the performance by avoiding unnecessary sendto syscalls for TX.
For RX, instead of kernel always busy-spinning on fille queue, OVS 
wakes

up the kernel RX processing when fill queue is replenished.

The need_wakeup feature is merged into Linux kernel bpf-next tee with 
commit
77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") 
and
OVS enables it by default, if libbpf supports it.  If users enable it 
but
runs in an older version of libbpf, then the need_wakeup feature has 
no effect,

and a warning message is logged.

For virtual interface, it's better set use_need_wakeup=false, since
the virtual device's AF_XDP xmit is synchronous: the sendto syscall
enters kernel and process the TX packet on tx queue directly.

On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
to physical port improves from 6.1Mpps to 7.3Mpps.

Suggested-by: Ilya Maximets 
Signed-off-by: William Tu 


Reviewed based on diff from previous version, also did quick compile 
test with and without need_wakeup supported kernel.

No actual performance tests ran, as my setup is in a messed up state :(

One small issue (guess can be fixed at apply time), the subject mentions 
“supprt”, it’s missing an o.



Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv5] netdev-linux: Detect numa node id.

2019-10-28 Thread Eelco Chaudron



On 23 Oct 2019, at 23:08, William Tu wrote:

> The patch detects the numa node id from the name of the netdev,
> by reading the '/sys/class/net//device/numa_node'.
> If not available, ex: virtual device, or any error happens,
> return numa id 0.  Currently only the afxdp netdev type uses it,
> other linux netdev types are disabled due to no use case.
>
> Signed-off-by: William Tu 

LGTM,

Eelco

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [dpdk-latest PATCH] netdev-dpdk: add custom vhost statistics to count IRQs

2019-10-28 Thread Eelco Chaudron



On 25 Oct 2019, at 17:06, Ilya Maximets wrote:


On 25.10.2019 15:51, Eelco Chaudron wrote:





 +static
+void vhost_guest_notified(int vid)
+{
+    struct netdev_dpdk *dev;
+
+    ovs_mutex_lock(_mutex);
+    LIST_FOR_EACH (dev, list_node, _list) {
+    if (netdev_dpdk_get_vid(dev) == vid) {
+    ovs_mutex_lock(>mutex);
+    rte_spinlock_lock(>stats_lock);
+    dev->vhost_irqs++;
+    rte_spinlock_unlock(>stats_lock);
+    ovs_mutex_unlock(>mutex);
+    break;
+    }
+    }
+    ovs_mutex_unlock(_mutex);


So, for every single eventfd_write() we're taking the global mutex,
traversing list of all the devices, taking one more mutex and 
finally

taking the spinlock just to increment a single counter.
I think, it's too much.


Yes you are right this might be a bit of overkill…


Wouldn't it significantly affect interrupt based virtio drivers in
terms of performance?  How frequently 2 vhost-user ports will lock
each other on the global device mutex?  How frequently 2 PMD 
threads

will lock each other on rx/tx to different vrings of the same vhost
port?


I used iperf3 and did not show any negative effects (maybe I should 
add more than 4 physical queues, 1 had one VM queue), but also the 
results show large deviations.


From my experience, I'd say that iperf via kernel virtio driver is not 
able
to saturate a PMD thread.  They are likely relaxed and have time to 
wait
on the mutex.  Also, You, probably, have only couple of ports, so the 
critical

section is not large enough to produce frequent interlocking.

I'm just pointing that to count number of eventfd syscalls we're 
probably

making 2 other syscalls and probably sleeping in them.


Yes I agree and send out a v2 using a simple coverage counter

For testing purposes, I'd suggest to add 10-20(100?) dummy pmd ports 
(e.g.
vhost without VMs) and assign them to different thread. Probably, more 
threads

in iperf for traffic generation, switching to small udp packets.
You could also replace lock() with trylock() for dpdk_mutex and count
contentions in a same way as it done for tx_lock in David's patch.


The try lock would have been a good thing to see if I hit this lock 
contention.
Was already testing with small UDP packets, but used iPerf as I needed a 
kernel like tool not a poll mode one based on DPDK.





I see that 'guest_notified' patch is already in dpdk master, but
wouldn't it be better to accumulate this statistics inside DPDK
and return it with some new API like rte_vhost_get_vring_xstats()
if required?  I don't see any usecase for this callback beside
counting some stats.


I agree, however, the vhost library has no internal counters (only 
the PMD implementation which we do not use), so hence they liked the 
callback.


One more possible issue is that this not-so-useful callback hijacked
the reserved space in the structure and we could suffer in the
future while adding new callbacks due to ABI breakage.
(In context of DPDK API/ABI stability discussions)




For the current patch I could only suggest to convert it to simple
coverage counter like it done in 'vhost_tx_contention' patch here:
https://patchwork.ozlabs.org/patch/1175849/


This is what I’ll do (and will send a patch soon). Although from a 
user perspective a per vhost user would make more sense as it could 
easily indicate which VM is configured wrongly…


Sure.  I think that we need to re-imagine concept and implementation 
of
pmd-perf-stats in a "coverage"-like way, as I suggested previously 
while
discussing initial patches for pmd-perf, to count stats like vhost 
qlen,
tx contentions and this one.  At least, this should be not so hard to 
do
on a per-PMD basis.  per-netdev might be more tricky, but I believe 
that

it is possible.


This might be something that should be added. For per-netdev we could 
use atomics assuming the counters are for exception use cases (or per 
CPU) and get ride of any locking.



For now we have architectural limitations that we have to live with.

Best regards, Ilya Maximets.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [dpdk-latest PATCH] sparse: Remove dpdk network headers copies.

2019-10-28 Thread David Marchand
Now that dpdk headers annotates the network header structures with
endianness [1], we can get rid of these copies.

1: https://git.dpdk.org/dpdk/commit/?id=7eca7f7fd09d

Signed-off-by: David Marchand 
---
 include/sparse/automake.mk |   6 -
 include/sparse/rte_esp.h   |  65 --
 include/sparse/rte_icmp.h  | 106 --
 include/sparse/rte_ip.h| 490 -
 include/sparse/rte_sctp.h  | 103 --
 include/sparse/rte_tcp.h   | 108 --
 include/sparse/rte_udp.h   | 103 --
 7 files changed, 981 deletions(-)
 delete mode 100644 include/sparse/rte_esp.h
 delete mode 100644 include/sparse/rte_icmp.h
 delete mode 100644 include/sparse/rte_ip.h
 delete mode 100644 include/sparse/rte_sctp.h
 delete mode 100644 include/sparse/rte_tcp.h
 delete mode 100644 include/sparse/rte_udp.h

diff --git a/include/sparse/automake.mk b/include/sparse/automake.mk
index 8f3e12d..073631e 100644
--- a/include/sparse/automake.mk
+++ b/include/sparse/automake.mk
@@ -1,11 +1,5 @@
 noinst_HEADERS += \
 include/sparse/rte_byteorder.h \
-include/sparse/rte_esp.h \
-include/sparse/rte_icmp.h \
-include/sparse/rte_ip.h \
-include/sparse/rte_sctp.h \
-include/sparse/rte_tcp.h \
-include/sparse/rte_udp.h \
 include/sparse/xmmintrin.h \
 include/sparse/arpa/inet.h \
 include/sparse/bits/floatn.h \
diff --git a/include/sparse/rte_esp.h b/include/sparse/rte_esp.h
deleted file mode 100644
index 7491074..000
--- a/include/sparse/rte_esp.h
+++ /dev/null
@@ -1,65 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef __CHECKER__
-#error "Use this header only with sparse.  It is not a correct implementation."
-#endif
-
-#ifndef _RTE_ESP_H_
-#define _RTE_ESP_H_
-
-/**
- * @file
- *
- * ESP-related defines
- */
-
-#include "openvswitch/types.h"
-#include 
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/**
- * ESP Header
- */
-struct rte_esp_hdr {
-   ovs_be32 spi;  /**< Security Parameters Index */
-   ovs_be32 seq;  /**< packet sequence number */
-} __attribute__((__packed__));
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* RTE_ESP_H_ */
diff --git a/include/sparse/rte_icmp.h b/include/sparse/rte_icmp.h
deleted file mode 100644
index f4844ed..000
--- a/include/sparse/rte_icmp.h
+++ /dev/null
@@ -1,106 +0,0 @@
-/*   BSD LICENSE
- *
- *   Copyright(c) 2013 6WIND.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of 6WIND S.A. nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT 

[ovs-dev] [PATCH] rhel: openvswitch-fedora.spec.in: Fix output redirect to null device

2019-10-28 Thread Roi Dayan
Add missing slash.

Fixes: 0447019df7c6 ("fedora-spec: added systemd post/postun/pre/preun 
sections")
Signed-off-by: Roi Dayan 
---
 rhel/openvswitch-fedora.spec.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 3a87c6d0c339..0f635210147c 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -340,7 +340,7 @@ fi
 %else
 # Package install, not upgrade
 if [ $1 -eq 1 ]; then
-/bin/systemctl daemon-reload >dev/null || :
+/bin/systemctl daemon-reload >/dev/null || :
 fi
 %endif
 
-- 
2.8.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] tc: Limit the max action number to 16

2019-10-28 Thread Roi Dayan



On 2019-10-28 10:15 AM, Simon Horman wrote:
> On Fri, Oct 25, 2019 at 12:19:22PM +0200, Simon Horman wrote:
>> On Thu, Oct 24, 2019 at 10:43:00AM +0200, Simon Horman wrote:
>>> On Mon, Oct 21, 2019 at 07:01:38AM +, Roi Dayan wrote:


 On 2019-10-18 1:00 PM, Simon Horman wrote:
> On Wed, Oct 16, 2019 at 11:53:52AM +, Roi Dayan wrote:
>>
>>
>> On 2019-10-16 2:40 PM, Simon Horman wrote:
>>> On Wed, Oct 16, 2019 at 11:37:14AM +0300, Roi Dayan wrote:
 From: Chris Mi 

 Currently, ovs supports to offload max TCA_ACT_MAX_PRIO(32) actions.
 But net sched api has a limit of 4K message size which is not enough
 for 32 actions when echo flag is set.

 After a lot of testing, we find that 16 actions is a reasonable number.
 So in this commit, we introduced a new define to limit the max actions.

 Fixes: 0c70132cd288("tc: Make the actions order consistent")
 Signed-off-by: Chris Mi 
 Reviewed-by: Roi Dayan 
>>>
>>> Hi Chris,
>>>
>>> I'm unclear on what problem is this patch solving.
>>
>> Hi Simon,
>>
>> I can help with the answer here.
>> When ovs send netlink msg to tc to add a filter we also add
>> the echo flag to get a reply data. this reply can be big as
>> it contains everything from the request and if it pass the 4K
>> size then the kernel will just not fill/send it but the return
>> status of the tc command is still 0 for success.
>>
>> In ovs we use that reply to get back the handle id on success but
>> for this case will fail.
>> To make sure this ack reply always exists for successful tc calls
>> we limit the number of actions here to avoid reaching the 4K size limit.
>
> Thanks,
>
> It sounds like it would be good to develop a mechanism where
> the information required can be retrieved in a less verbose manner,
> thus allowing a higher limit.
>
> But I do agree that this resolves a problem and I have applied it to
> master. Let me know if you think it is also appropriate for older 
> branches.
>
> Kind regards,
> Simon
>

 thanks Simon.
 this patch can be applied also to branches 2.10, 2.11, 2.12.
>>>
>>> Thanks,
>>>
>>> I have pushed a backport to branch-2.12.
>>> And I intend to do likewise for branch-2.11 if travis-ci passes.
>>
>> I have now pushed this patch to branch-2.11.
>>
>>>
>>> I am however holding back on branch-2.10 as it seems broken
>>> with respect to travis-ci. And I do not like to add patches
>>> to broken branches.
>>>
>>> I am trying to investigate the cause of that breakage.
>>> I would also welcome any help in this area.
>>
>> I have proposed a fix for this problem.
>>
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2019-October%2F363868.htmldata=02%7C01%7Croid%40mellanox.com%7Caca4498333aa467cae6c08d75b7ef3d1%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637078473125610665sdata=p4oai3sC8F4FHfFrvfv7q0XAzk6%2BY%2Bs9cD736XgLvdg%3Dreserved=0
> 
> That fix was merged by Ben and now that branch-2.10 is travis-ci clean
> I have gone ahead and applied this patch there.
> 

thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] tc: Limit the max action number to 16

2019-10-28 Thread Simon Horman
On Fri, Oct 25, 2019 at 12:19:22PM +0200, Simon Horman wrote:
> On Thu, Oct 24, 2019 at 10:43:00AM +0200, Simon Horman wrote:
> > On Mon, Oct 21, 2019 at 07:01:38AM +, Roi Dayan wrote:
> > > 
> > > 
> > > On 2019-10-18 1:00 PM, Simon Horman wrote:
> > > > On Wed, Oct 16, 2019 at 11:53:52AM +, Roi Dayan wrote:
> > > >>
> > > >>
> > > >> On 2019-10-16 2:40 PM, Simon Horman wrote:
> > > >>> On Wed, Oct 16, 2019 at 11:37:14AM +0300, Roi Dayan wrote:
> > >  From: Chris Mi 
> > > 
> > >  Currently, ovs supports to offload max TCA_ACT_MAX_PRIO(32) actions.
> > >  But net sched api has a limit of 4K message size which is not enough
> > >  for 32 actions when echo flag is set.
> > > 
> > >  After a lot of testing, we find that 16 actions is a reasonable 
> > >  number.
> > >  So in this commit, we introduced a new define to limit the max 
> > >  actions.
> > > 
> > >  Fixes: 0c70132cd288("tc: Make the actions order consistent")
> > >  Signed-off-by: Chris Mi 
> > >  Reviewed-by: Roi Dayan 
> > > >>>
> > > >>> Hi Chris,
> > > >>>
> > > >>> I'm unclear on what problem is this patch solving.
> > > >>
> > > >> Hi Simon,
> > > >>
> > > >> I can help with the answer here.
> > > >> When ovs send netlink msg to tc to add a filter we also add
> > > >> the echo flag to get a reply data. this reply can be big as
> > > >> it contains everything from the request and if it pass the 4K
> > > >> size then the kernel will just not fill/send it but the return
> > > >> status of the tc command is still 0 for success.
> > > >>
> > > >> In ovs we use that reply to get back the handle id on success but
> > > >> for this case will fail.
> > > >> To make sure this ack reply always exists for successful tc calls
> > > >> we limit the number of actions here to avoid reaching the 4K size 
> > > >> limit.
> > > > 
> > > > Thanks,
> > > > 
> > > > It sounds like it would be good to develop a mechanism where
> > > > the information required can be retrieved in a less verbose manner,
> > > > thus allowing a higher limit.
> > > > 
> > > > But I do agree that this resolves a problem and I have applied it to
> > > > master. Let me know if you think it is also appropriate for older 
> > > > branches.
> > > > 
> > > > Kind regards,
> > > > Simon
> > > > 
> > > 
> > > thanks Simon.
> > > this patch can be applied also to branches 2.10, 2.11, 2.12.
> > 
> > Thanks,
> > 
> > I have pushed a backport to branch-2.12.
> > And I intend to do likewise for branch-2.11 if travis-ci passes.
> 
> I have now pushed this patch to branch-2.11.
> 
> > 
> > I am however holding back on branch-2.10 as it seems broken
> > with respect to travis-ci. And I do not like to add patches
> > to broken branches.
> > 
> > I am trying to investigate the cause of that breakage.
> > I would also welcome any help in this area.
> 
> I have proposed a fix for this problem.
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/363868.html

That fix was merged by Ben and now that branch-2.10 is travis-ci clean
I have gone ahead and applied this patch there.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v4 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-10-28 Thread Tonghao Zhang
On Thu, Oct 24, 2019 at 3:14 PM Pravin Shelar  wrote:
>
> On Tue, Oct 22, 2019 at 7:35 PM Tonghao Zhang  
> wrote:
> >
> > On Tue, Oct 22, 2019 at 2:58 PM Pravin Shelar  wrote:
> > >
> ...
>
> > > > >
> > > Sure, I can review it, Can you send the patch inlined in mail?
> > >
> > > Thanks.
> > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> > index 5df5182..5b20793 100644
> > --- a/net/openvswitch/flow_table.c
> > +++ b/net/openvswitch/flow_table.c
> > @@ -257,10 +257,75 @@ static void flow_tbl_destroy_rcu_cb(struct rcu_head 
> > *rcu)
> > __table_instance_destroy(ti);
> >  }
> >
> > -static void table_instance_destroy(struct table_instance *ti,
> > -  struct table_instance *ufid_ti,
> > +static void tbl_mask_array_del_mask(struct flow_table *tbl,
> > +   struct sw_flow_mask *mask)
> > +{
> > +   struct mask_array *ma = ovsl_dereference(tbl->mask_array);
> > +   int i, ma_count = READ_ONCE(ma->count);
> > +
> > +   /* Remove the deleted mask pointers from the array */
> > +   for (i = 0; i < ma_count; i++) {
> > +   if (mask == ovsl_dereference(ma->masks[i]))
> > +   goto found;
> > +   }
> > +
> > +   BUG();
> > +   return;
> > +
> > +found:
> > +   WRITE_ONCE(ma->count, ma_count -1);
> > +
> > +   rcu_assign_pointer(ma->masks[i], ma->masks[ma_count -1]);
> > +   RCU_INIT_POINTER(ma->masks[ma_count -1], NULL);
> > +
> > +   kfree_rcu(mask, rcu);
> > +
> > +   /* Shrink the mask array if necessary. */
> > +   if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
> > +   ma_count <= (ma->max / 3))
> > +   tbl_mask_array_realloc(tbl, ma->max / 2);
> > +}
> > +
> > +/* Remove 'mask' from the mask list, if it is not needed any more. */
> > +static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask 
> > *mask)
> > +{
> > +   if (mask) {
> > +   /* ovs-lock is required to protect mask-refcount and
> > +* mask list.
> > +*/
> > +   ASSERT_OVSL();
> > +   BUG_ON(!mask->ref_count);
> > +   mask->ref_count--;
> > +
> > +   if (!mask->ref_count)
> > +   tbl_mask_array_del_mask(tbl, mask);
> > +   }
> > +}
> > +
> > +static void table_instance_remove(struct flow_table *table, struct
> > sw_flow *flow)
> > +{
> > +   struct table_instance *ti = ovsl_dereference(table->ti);
> > +   struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
> > +
> > +   BUG_ON(table->count == 0);
> > +   hlist_del_rcu(>flow_table.node[ti->node_ver]);
> > +   table->count--;
> > +   if (ovs_identifier_is_ufid(>id)) {
> > +   hlist_del_rcu(>ufid_table.node[ufid_ti->node_ver]);
> > +   table->ufid_count--;
> > +   }
> > +
> > +   /* RCU delete the mask. 'flow->mask' is not NULLed, as it should be
> > +* accessible as long as the RCU read lock is held.
> > +*/
> > +   flow_mask_remove(table, flow->mask);
> > +}
> > +
> > +static void table_instance_destroy(struct flow_table *table,
> >bool deferred)
> >  {
> > +   struct table_instance *ti = ovsl_dereference(table->ti);
> > +   struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
> > int i;
> >
> > if (!ti)
> > @@ -274,13 +339,9 @@ static void table_instance_destroy(struct
> > table_instance *ti,
> > struct sw_flow *flow;
> > struct hlist_head *head = >buckets[i];
> > struct hlist_node *n;
> > -   int ver = ti->node_ver;
> > -   int ufid_ver = ufid_ti->node_ver;
> >
> > -   hlist_for_each_entry_safe(flow, n, head, 
> > flow_table.node[ver]) {
> > -   hlist_del_rcu(>flow_table.node[ver]);
> > -   if (ovs_identifier_is_ufid(>id))
> > -   
> > hlist_del_rcu(>ufid_table.node[ufid_ver]);
> > +   hlist_for_each_entry_safe(flow, n, head,
> > flow_table.node[ti->node_ver]) {
> > +   table_instance_remove(table, flow);
> > ovs_flow_free(flow, deferred);
> > }
> > }
> > @@ -300,12 +361,9 @@ static void table_instance_destroy(struct
> > table_instance *ti,
> >   */
> >  void ovs_flow_tbl_destroy(struct flow_table *table)
> >  {
> > -   struct table_instance *ti = rcu_dereference_raw(table->ti);
> > -   struct table_instance *ufid_ti = 
> > rcu_dereference_raw(table->ufid_ti);
> > -
> > free_percpu(table->mask_cache);
> > kfree_rcu(rcu_dereference_raw(table->mask_array), rcu);
> > -   table_instance_destroy(ti, ufid_ti, false);
> > +   table_instance_destroy(table, false);
> >  }
> >
> >  struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
> > @@