Re: [ovs-dev] How to set QoS for VM egress traffic on tunnel mode

2017-11-22 Thread 王志克
Hi,

The topo can be same as below example.
http://docs.openvswitch.org/en/latest/howto/tunneling/?highlight=tunnel

I just wonder in such configuration how the egress shaping can be configured 
for different VM. Or the QoS does not work for tunnel case?

Appreciate help.

BR,
Wang Zhike

From: 王志克
Sent: Wednesday, November 22, 2017 5:51 PM
To: ovs-dev@openvswitch.org; ovs-disc...@openvswitch.org
Subject: How to set QoS for VM egress traffic on tunnel mode

Hi All,

I want to set QoS with guide from below link “egress traffic shaping”, but do 
not know how for tunnel mode.
http://docs.openvswitch.org/en/latest/faq/qos/

My scenario:

I have several VM ports, and several VxLan ports in br0, and there is one 
seprate eth0 port (not in br0), which is the underlay port of Vxlan port.
Currently I add rule to match VM traffic to certain Vxlan port, and all these 
vxlan ports would go out through eth0.

Now I want to enable egress traffic shaping, eg:
VM1 goes out with min_rate=10M, max_rate=100M.
VM2 goes out with min_rate=50M, max_rate=200M.

In the given example in http://docs.openvswitch.org/en/latest/faq/qos/ chapter 
“egress traffic shaping”, it directly uses the physical port. But in tunnel 
case, I can NOT specify the physical port directly.
So how to configrue egress traffic shaping in tunnel case? Appreciate some 
example configruation. Thanks.


Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Ley Sarbanes Oxley

2017-11-22 Thread Elevar la calidad de gobierno dentro de su empresa
Administración de riesgo y control para garantizar el cumplimiento de objetivos

Ley Sarbanes Oxley
01 de Diciembre - 9am a 6pm - MCE. Abdón Guzmán Santana 

Hoy en día las organizaciones requieren de cumplir con regulaciones más 
exigentes en relación al Gobierno Corporativo. Tal es el caso de empresas que 
cotizan en bolsas estadounidenses o en la Bolsa Mexicana de Valores. Por otra 
parte, una adecuada administración de riesgo y control es fundamental para 
garantizar el cumplimiento de los objetivos de las organizaciones. 

BENEFICIOS DE ASISTIR: 

• Conocerá los postulados y su entendimiento de la Ley Sarbanes Oxley.
• Reconocerá los riesgos potenciales que afecten el ciclo financiero, 
incluyendo el fraude.
• Documentará e implementará los controles mitigantes con respecto a un marco 
de control interno bajo los requerimientos de la Ley Sarbanes Oxley. 
• Sabrá cómo implementar un mecanismo de control efectivo para el cumplimiento 
de la Ley Sarbanes Oxley.
• Conocerá las sanciones aplicables por la autoridad que gestiona esta Ley. 

¿Requiere la información a la Brevedad? responda este email con la palabra: 
Oxley+ nombre - teléfono - correo.


centro telefónico:018002120744


 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC PATCH v2] dpif-netdev: Add port/queue tiebreaker to rxq_cycle_sort.

2017-11-22 Thread Kevin Traynor
rxq_cycle_sort is used to sort the rx queues by their measured number
of cycles. In the event that they are equal 0 could be returned.
However, it is observed that returning 0 results in a different sort
order on Windows/Linux. This is ok in practice but it causes a unit
test failure for
"1007: PMD - pmd-cpu-mask/distribution of rx queues" on Windows.

In order to have a consistent sort result, introduce a tiebreaker of
port/queue.

Fixes: 655856ef39b9 ("dpif-netdev: Change rxq_scheduling to use rxq processing 
cycles.")
Reported-by: Alin Gabriel Serdean 
Signed-off-by: Kevin Traynor 
---

v2: Inadvertently reversed the order for non-tiebreak cases in v1. Fix that.

 lib/dpif-netdev.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a62630..57451e9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3452,4 +3452,5 @@ rxq_cycle_sort(const void *a, const void *b)
 uint64_t total_qa, total_qb;
 unsigned i;
+int winner = 1;
 
 qa = *(struct dp_netdev_rxq **) a;
@@ -3464,8 +3465,16 @@ rxq_cycle_sort(const void *a, const void *b)
 dp_netdev_rxq_set_cycles(qb, RXQ_CYCLES_PROC_HIST, total_qb);
 
-if (total_qa >= total_qb) {
-return -1;
+if (total_qa > total_qb) {
+winner = -1;
+} else if (total_qa == total_qb) {
+/* Cycles are the same.  Tiebreak on port/queue id. */
+if (qb->port->port_no > qa->port->port_no) {
+winner = -1;
+} else if (qa->port->port_no == qb->port->port_no) {
+winner = netdev_rxq_get_queue_id(qa->rx)
+- netdev_rxq_get_queue_id(qb->rx);
+}
 }
-return 1;
+return winner;
 }
 
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] dpif-netdev: Calculate rxq cycles prior to rxq_cycle_sort calls.

2017-11-22 Thread Kevin Traynor
rxq_cycle_sort summed the latest cycles from each queue for sorting.
While each comparison was correct with the latest cycles, the cycles
could change between calls to rxq_cycle_sort. In order to use
consistent values through each call to rxq_cycle_sort, sum the cycles
prior to rxq_cycle_sort being called.

Requested-by: Ilya Maximets 
Signed-off-by: Kevin Traynor 
---

V3:
- Drop V2 1/2 as merged
- Removed rxq_cycle_sort return changes for a later patch
- Some minor variable rename and comment changes

 lib/dpif-netdev.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a62630..06a6e62 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3450,19 +3450,13 @@ rxq_cycle_sort(const void *a, const void *b)
 struct dp_netdev_rxq *qa;
 struct dp_netdev_rxq *qb;
-uint64_t total_qa, total_qb;
-unsigned i;
+uint64_t cycles_qa, cycles_qb;
 
 qa = *(struct dp_netdev_rxq **) a;
 qb = *(struct dp_netdev_rxq **) b;
 
-total_qa = total_qb = 0;
-for (i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) {
-total_qa += dp_netdev_rxq_get_intrvl_cycles(qa, i);
-total_qb += dp_netdev_rxq_get_intrvl_cycles(qb, i);
-}
-dp_netdev_rxq_set_cycles(qa, RXQ_CYCLES_PROC_HIST, total_qa);
-dp_netdev_rxq_set_cycles(qb, RXQ_CYCLES_PROC_HIST, total_qb);
+cycles_qa = dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_HIST);
+cycles_qb = dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_HIST);
 
-if (total_qa >= total_qb) {
+if (cycles_qa >= cycles_qb) {
 return -1;
 }
@@ -3512,4 +3506,6 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
 }
 } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
+uint64_t cycle_hist = 0;
+
 if (n_rxqs == 0) {
 rxqs = xmalloc(sizeof *rxqs);
@@ -3517,4 +3513,10 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
 rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
 }
+/* Sum the queue intervals and store the cycle history. */
+for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) {
+cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i);
+}
+dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, cycle_hist);
+
 /* Store the queue. */
 rxqs[n_rxqs++] = q;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Add comment about variables naming convention.

2017-11-22 Thread Stokes, Ian


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Ilya Maximets
> Sent: Tuesday, November 7, 2017 1:17 PM
> To: ovs-dev@openvswitch.org
> Cc: Ilya Maximets ; Heetae Ahn
> 
> Subject: [ovs-dev] [PATCH 2/2] netdev-dpdk: Add comment about variables
> naming convention.
> 

Hi Ilya,

This looks good and will be useful in future as a reference.

Minor nit would be I'd like to see the commit message expanded upon, maybe 
mentioning which variables are documented etc. to make the commit message more 
robust. Will look to add this to the DPDK merge branch then.

Thanks
Ian

> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-dpdk.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 872b133..9423109
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -330,6 +330,23 @@ enum dpdk_hw_ol_features {
>  NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,  };
> 
> +/*
> + * In order to avoid confusion in variables names, following naming
> +convention
> + * should be used, if possible:
> + *
> + * 'struct netdev'  : 'netdev'
> + * 'struct netdev_dpdk' : 'dev'
> + * 'struct netdev_rxq'  : 'rxq'
> + * 'struct netdev_rxq_dpdk' : 'rx'
> + *
> + * Example:
> + * struct netdev *netdev = netdev_from_name(name);
> + * struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + *
> + *  Also, 'netdev' should be used instead of 'dev->up', where 'netdev'
> +was
> + *  already defined.
> + */
> +
>  struct netdev_dpdk {
>  PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0,
>  dpdk_port_t port_id;
> --
> 2.7.4
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] netdev-dpdk: Fix variables naming in set_admin_state function.

2017-11-22 Thread Stokes, Ian
> Function 'netdev_dpdk_set_admin_state()' was missed while fixing variables
> naming according to the following convention:
> 
> 'struct netdev':'netdev'
> 'struct netdev_dpdk':'dev'
> 'struct netdev_rxq':'rxq'
> 'struct netdev_rxq_dpdk':'rx'
> 

LGTM, will add this to the DPDK merge branch.

Thanks
Ian

> Fixes: d46285a2206f ("netdev-dpdk: Consistent variable naming.")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-dpdk.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index dc210cc..872b133
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2486,12 +2486,13 @@ netdev_dpdk_set_admin_state(struct unixctl_conn
> *conn, int argc,
> 
>  if (argc > 2) {
>  struct netdev *netdev = netdev_from_name(argv[1]);
> +
>  if (netdev && is_dpdk_class(netdev->netdev_class)) {
> -struct netdev_dpdk *dpdk_dev = netdev_dpdk_cast(netdev);
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> 
> -ovs_mutex_lock(&dpdk_dev->mutex);
> -netdev_dpdk_set_admin_state__(dpdk_dev, up);
> -ovs_mutex_unlock(&dpdk_dev->mutex);
> +ovs_mutex_lock(&dev->mutex);
> +netdev_dpdk_set_admin_state__(dev, up);
> +ovs_mutex_unlock(&dev->mutex);
> 
>  netdev_close(netdev);
>  } else {
> @@ -2500,13 +2501,13 @@ netdev_dpdk_set_admin_state(struct unixctl_conn
> *conn, int argc,
>  return;
>  }
>  } else {
> -struct netdev_dpdk *netdev;
> +struct netdev_dpdk *dev;
> 
>  ovs_mutex_lock(&dpdk_mutex);
> -LIST_FOR_EACH (netdev, list_node, &dpdk_list) {
> -ovs_mutex_lock(&netdev->mutex);
> -netdev_dpdk_set_admin_state__(netdev, up);
> -ovs_mutex_unlock(&netdev->mutex);
> +LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> +ovs_mutex_lock(&dev->mutex);
> +netdev_dpdk_set_admin_state__(dev, up);
> +ovs_mutex_unlock(&dev->mutex);
>  }
>  ovs_mutex_unlock(&dpdk_mutex);
>  }
> --
> 2.7.4
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC PATCH] dpif-netdev: Add port/queue tiebreaker to rxq_cycle_sort.

2017-11-22 Thread Kevin Traynor
rxq_cycle_sort is used to sort the rx queues by their measured number
of cycles. In the event that they are equal 0 could be returned.
However, it is observed that returning 0 results in a different sort
order on Windows/Linux. This is ok in practice but it causes a unit
test failure for
"1007: PMD - pmd-cpu-mask/distribution of rx queues" on Windows.

In order to have a consistent sort result, introduce a tiebreaker of
port/queue.

Fixes: 655856ef39b9 ("dpif-netdev: Change rxq_scheduling to use rxq processing 
cycles.")
Reported-by: Alin Gabriel Serdean 
Signed-off-by: Kevin Traynor 
---
 lib/dpif-netdev.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a62630..5627462 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3452,4 +3452,5 @@ rxq_cycle_sort(const void *a, const void *b)
 uint64_t total_qa, total_qb;
 unsigned i;
+int winner = -1;
 
 qa = *(struct dp_netdev_rxq **) a;
@@ -3464,8 +3465,16 @@ rxq_cycle_sort(const void *a, const void *b)
 dp_netdev_rxq_set_cycles(qb, RXQ_CYCLES_PROC_HIST, total_qb);
 
-if (total_qa >= total_qb) {
-return -1;
+if (total_qa > total_qb) {
+winner = 1;
+} else if (total_qa == total_qb) {
+/* Cycles are the same.  Tiebreak on port/queue id. */
+if (qa->port->port_no > qb->port->port_no) {
+winner = 1;
+} else if (qa->port->port_no == qb->port->port_no) {
+winner = netdev_rxq_get_queue_id(qa->rx)
+- netdev_rxq_get_queue_id(qb->rx);
+}
 }
-return 1;
+return winner;
 }
 
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure."

2017-11-22 Thread Bodireddy, Bhanuprakash
>This reverts commit a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941.
>
>Padding and aligning of dp_netdev_pmd_thread structure members is
>useless, broken in a several ways and only greatly degrades maintainability
>and extensibility of the structure.

The idea of my earlier patch was to mark the cache lines and reduce the holes 
while still maintaining the grouping of related members in this structure.
Also cache line marking is a good practice to make some one extra cautious when 
extending or editing important structures . 
Most importantly I was experimenting with prefetching on this structure and 
needed cache line markers for it. 

I see that you are on ARM (I don't have HW to test) and want to know if this 
commit has any negative affect and any numbers would be appreciated.
More comments inline.

>
>Issues:
>
>1. It's not working because all the instances of struct
>   dp_netdev_pmd_thread allocated only by usual malloc. All the
>   memory is not aligned to cachelines -> structure almost never
>   starts at aligned memory address. This means that any further
>   paddings and alignments inside the structure are completely
>   useless. Fo example:
>
>   Breakpoint 1, pmd_thread_main
>   (gdb) p pmd
>   $49 = (struct dp_netdev_pmd_thread *) 0x1b1af20
>   (gdb) p &pmd->cacheline1
>   $51 = (OVS_CACHE_LINE_MARKER *) 0x1b1af60
>   (gdb) p &pmd->cacheline0
>   $52 = (OVS_CACHE_LINE_MARKER *) 0x1b1af20
>   (gdb) p &pmd->flow_cache
>   $53 = (struct emc_cache *) 0x1b1afe0
>
>   All of the above addresses shifted from cacheline start by 32B.

If you see below all the addresses are 64 byte aligned.

(gdb) p pmd
$1 = (struct dp_netdev_pmd_thread *) 0x7fc1e9b1a040
(gdb) p &pmd->cacheline0
$2 = (OVS_CACHE_LINE_MARKER *) 0x7fc1e9b1a040
(gdb) p &pmd->cacheline1
$3 = (OVS_CACHE_LINE_MARKER *) 0x7fc1e9b1a080
(gdb) p &pmd->flow_cache
$4 = (struct emc_cache *) 0x7fc1e9b1a0c0
(gdb) p &pmd->flow_table
$5 = (struct cmap *) 0x7fc1e9fba100
(gdb) p &pmd->stats
$6 = (struct dp_netdev_pmd_stats *) 0x7fc1e9fba140
(gdb) p &pmd->port_mutex
$7 = (struct ovs_mutex *) 0x7fc1e9fba180
(gdb) p &pmd->poll_list
$8 = (struct hmap *) 0x7fc1e9fba1c0
(gdb) p &pmd->tnl_port_cache
$9 = (struct hmap *) 0x7fc1e9fba200
(gdb) p &pmd->stats_zero
$10 = (unsigned long long (*)[5]) 0x7fc1e9fba240

I tried using xzalloc_cacheline instead of default xzalloc() here.  I tried 
tens of times and always found that the address is
64 byte aligned and it should start at the beginning of cache line on X86. 
Not sure why the comment  " (The memory returned will not be at the start of  a 
cache line, though, so don't assume such alignment.)" says otherwise?

>
>   Can we fix it properly? NO.
>   OVS currently doesn't have appropriate API to allocate aligned
>   memory. The best candidate is 'xmalloc_cacheline()' but it
>   clearly states that "The memory returned will not be at the
>   start of a cache line, though, so don't assume such alignment".
>   And also, this function will never return aligned memory on
>   Windows or MacOS.
>
>2. CACHE_LINE_SIZE is not constant. Different architectures have
>   different cache line sizes, but the code assumes that
>   CACHE_LINE_SIZE is always equal to 64 bytes. All the structure
>   members are grouped by 64 bytes and padded to CACHE_LINE_SIZE.
>   This leads to a huge holes in a structures if CACHE_LINE_SIZE
>   differs from 64. This is opposite to portability. If I want
>   good performance of cmap I need to have CACHE_LINE_SIZE equal
>   to the real cache line size, but I will have huge holes in the
>   structures. If you'll take a look to struct rte_mbuf from DPDK
>   you'll see that it uses 2 defines: RTE_CACHE_LINE_SIZE and
>   RTE_CACHE_LINE_MIN_SIZE to avoid holes in mbuf structure.

I understand that ARM and few other processors (like OCTEON) have 128 bytes 
cache lines.
But  again curious of performance impact in your case with this new alignment.

>
>3. Sizes of system/libc defined types are not constant for all the
>   systems. For example, sizeof(pthread_mutex_t) == 48 on my
>   ARMv8 machine, but only 40 on x86. The difference could be
>   much bigger on Windows or MacOS systems. But the code assumes
>   that sizeof(struct ovs_mutex) is always 48 bytes. This may lead
>   to broken alignment/big holes in case of padding/wrong comments
>   about amount of free pad bytes.

This isn't an issue as you would have already mentioned and more about issue 
with the comment that reads the pad bytes.
In case of ARM it would be just 8 pad bytes instead of 16 on X86. 

union {
struct {
struct ovs_mutex port_mutex; /* 484998448 */
};/*  48 */
uint8_tpad13[64];/*  64 */
};   /

>
>

[ovs-dev] Google Cloud Users List

2017-11-22 Thread Amy Hodge
Hello there,



I would like to know if you are interested in acquiring Google Cloud Users List.



Information fields: Names, Title, Email, Phone, Company Name, Company URL, 
Company physical address, SIC Code, Industry, Company Size (Revenue and 
Employee).



Let me know if you are interested and I will get back to you with the counts 
and pricing.



Regards,
Amy Hodge:)

Marketing Executive



To opt out, please reply with Leave Out in the Subject Line.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] COME AND WORK

2017-11-22 Thread judybryant . mail95
Hausta militari cruises stella in Australia cogitationem cordis sui  
auget debitum ad novum basi cruise naves et incrementum in numerum  
customers Linearibus Australia.
In summa Recruitments erit 1,570 audit. Age minimum requisitis est  
XVIII annis, et supra.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH v3 1/8] netdev-dpdk: simplify mbuf sizing

2017-11-22 Thread Kavanagh, Mark B
+ Santosh

>From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
>On Behalf Of Mark Kavanagh
>Sent: Tuesday, November 21, 2017 6:29 PM
>To: d...@openvswitch.org; qiud...@chinac.com
>Subject: [ovs-dev] [RFC PATCH v3 1/8] netdev-dpdk: simplify mbuf sizing
>
>When calculating the mbuf data_room_size (i.e. the size of actual
>packet data that an mbuf can accomodate), it is possible to simply
>use the value calculated by dpdk_buf_size() as a parameter to
>rte_pktmbuf_pool_create(). This simplifies mbuf sizing considerably.
>This patch removes the related size conversions and macros, which
>are no longer needed.
>
>The benefits of this approach are threefold:
>- the mbuf sizing code is much simpler, and more readable.
>- mbuf size will always be cache-aligned [1], satisfying that
>  requirement of specific PMDs (vNIC thunderx, for example).
>- the maximum amount of data that each mbuf contains may now be
>  calculated as mbuf->buf_len - mbuf->data_off. This is important
>  in the case of multi-segment jumbo frames.
>
>[1] (this is true since mbuf size is now always a multiple
> of 1024, + 128B RTE_PKTMBUF_HEADROOM + 704B dp_packet).

Santosh, I've just spotted that the cacheline size for thunderx is defined as 
128B, as opposed to 64.

==> 
config/defconfig_arm64-thunderx-linuxapp-gcc:36:CONFIG_RTE_CACHE_LINE_SIZE=128

Apologies for the oversight - I'd be very interested to hear your thoughts on 
this patch.

Thanks in advance,
Mark


>
>Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
>Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
>Signed-off-by: Mark Kavanagh 
>---
> lib/netdev-dpdk.c | 22 --
> 1 file changed, 8 insertions(+), 14 deletions(-)
>
>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>index 8906423..c5eb851 100644
>--- a/lib/netdev-dpdk.c
>+++ b/lib/netdev-dpdk.c
>@@ -81,12 +81,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
>20);
>  + (2 * VLAN_HEADER_LEN))
> #define MTU_TO_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
> #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_MAX_LEN)
>-#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)\
>- - ETHER_HDR_LEN - ETHER_CRC_LEN)
>-#define MBUF_SIZE(mtu)  ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
>- + sizeof(struct dp_packet) \
>- + RTE_PKTMBUF_HEADROOM),   \
>- RTE_CACHE_LINE_SIZE)
> #define NETDEV_DPDK_MBUF_ALIGN  1024
> #define NETDEV_DPDK_MAX_PKT_LEN 9728
>
>@@ -447,7 +441,7 @@ is_dpdk_class(const struct netdev_class *class)
>  * behaviour, which reduces performance. To prevent this, use a buffer size
>  * that is closest to 'mtu', but which satisfies the aforementioned criteria.
>  */
>-static uint32_t
>+static uint16_t
> dpdk_buf_size(int mtu)
> {
> return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM),
>@@ -486,7 +480,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
>  *  - a new mempool was just created;
>  *  - a matching mempool already exists. */
> static struct rte_mempool *
>-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>+dpdk_mp_create(struct netdev_dpdk *dev, uint16_t frame_len)
> {
> char mp_name[RTE_MEMPOOL_NAMESIZE];
> const char *netdev_name = netdev_get_name(&dev->up);
>@@ -513,12 +507,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>  * longer than RTE_MEMPOOL_NAMESIZE. */
> int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
>"ovs%08x%02d%05d%07u",
>-   hash, socket_id, mtu, n_mbufs);
>+   hash, socket_id, frame_len, n_mbufs);
> if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> VLOG_DBG("snprintf returned %d. "
>  "Failed to generate a mempool name for \"%s\". "
>- "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
>- ret, netdev_name, hash, socket_id, mtu, n_mbufs);
>+ "Hash:0x%x, socket_id: %d, frame length:%d, mbufs:%u.",
>+ ret, netdev_name, hash, socket_id, frame_len, n_mbufs);
> break;
> }
>
>@@ -529,7 +523,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>
> mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
>  sizeof (struct dp_packet) - sizeof (struct rte_mbuf),
>- MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id);
>+ frame_len + RTE_PKTMBUF_HEADROOM, socket_id);
>
> if (mp) {
> VLOG_DBG("Allocated \"%s\" mempool with %u mbufs",
>@@ -582,11 +576,11 @@ static int
> netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
> OVS_REQUIRES(dev->mutex)
> {
>-uint32_t buf_size = dpdk_buf_size(dev->reque

[ovs-dev] [PATCH] Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure."

2017-11-22 Thread Ilya Maximets
This reverts commit a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941.

Padding and aligning of dp_netdev_pmd_thread structure members
is useless, broken in a several ways and only greatly degrades
maintainability and extensibility of the structure.

Issues:

1. It's not working because all the instances of struct
   dp_netdev_pmd_thread allocated only by usual malloc. All the
   memory is not aligned to cachelines -> structure almost never
   starts at aligned memory address. This means that any further
   paddings and alignments inside the structure are completely
   useless. Fo example:

   Breakpoint 1, pmd_thread_main
   (gdb) p pmd
   $49 = (struct dp_netdev_pmd_thread *) 0x1b1af20
   (gdb) p &pmd->cacheline1
   $51 = (OVS_CACHE_LINE_MARKER *) 0x1b1af60
   (gdb) p &pmd->cacheline0
   $52 = (OVS_CACHE_LINE_MARKER *) 0x1b1af20
   (gdb) p &pmd->flow_cache
   $53 = (struct emc_cache *) 0x1b1afe0

   All of the above addresses shifted from cacheline start by 32B.

   Can we fix it properly? NO.
   OVS currently doesn't have appropriate API to allocate aligned
   memory. The best candidate is 'xmalloc_cacheline()' but it
   clearly states that "The memory returned will not be at the
   start of a cache line, though, so don't assume such alignment".
   And also, this function will never return aligned memory on
   Windows or MacOS.

2. CACHE_LINE_SIZE is not constant. Different architectures have
   different cache line sizes, but the code assumes that
   CACHE_LINE_SIZE is always equal to 64 bytes. All the structure
   members are grouped by 64 bytes and padded to CACHE_LINE_SIZE.
   This leads to a huge holes in a structures if CACHE_LINE_SIZE
   differs from 64. This is opposite to portability. If I want
   good performance of cmap I need to have CACHE_LINE_SIZE equal
   to the real cache line size, but I will have huge holes in the
   structures. If you'll take a look to struct rte_mbuf from DPDK
   you'll see that it uses 2 defines: RTE_CACHE_LINE_SIZE and
   RTE_CACHE_LINE_MIN_SIZE to avoid holes in mbuf structure.

3. Sizes of system/libc defined types are not constant for all the
   systems. For example, sizeof(pthread_mutex_t) == 48 on my
   ARMv8 machine, but only 40 on x86. The difference could be
   much bigger on Windows or MacOS systems. But the code assumes
   that sizeof(struct ovs_mutex) is always 48 bytes. This may lead
   to broken alignment/big holes in case of padding/wrong comments
   about amount of free pad bytes.

4. Sizes of the many fileds in structure depends on defines like
   DP_N_STATS, PMD_N_CYCLES, EM_FLOW_HASH_ENTRIES and so on.
   Any change in these defines or any change in any structure
   contained by thread should lead to the not so simple
   refactoring of the whole dp_netdev_pmd_thread structure. This
   greatly reduces maintainability and complicates development of
   a new features.

5. There is no reason to align flow_cache member because it's
   too big and we usually access random entries by single thread
   only.

So, the padding/alignment only creates some visibility of performance
optimization but does nothing useful in reality. It only complicates
maintenance and adds huge holes for non-x86 architectures and non-Linux
systems. Performance improvement stated in a original commit message
should be random and not valuable. I see no performance difference.

Most of the above issues are also true for some other padded/aligned
structures like 'struct netdev_dpdk'. They will be treated separately.

CC: Bhanuprakash Bodireddy 
CC: Ben Pfaff 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 160 +++---
 1 file changed, 69 insertions(+), 91 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a62630..6784269 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -547,31 +547,18 @@ struct tx_port {
  * actions in either case.
  * */
 struct dp_netdev_pmd_thread {
-PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0,
-struct dp_netdev *dp;
-struct cmap_node node;  /* In 'dp->poll_threads'. */
-pthread_cond_t cond;/* For synchronizing pmd thread
-   reload. */
-);
-
-PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1,
-struct ovs_mutex cond_mutex;/* Mutex for condition variable. */
-pthread_t thread;
-unsigned core_id;   /* CPU core id of this pmd thread. */
-int numa_id;/* numa node id of this pmd thread. */
-);
+struct dp_netdev *dp;
+struct ovs_refcount ref_cnt;/* Every reference must be refcount'ed. */
+struct cmap_node node;  /* In 'dp->poll_threads'. */
+
+pthread_cond_t cond;/* 

Re: [ovs-dev] [RFC PATCH V2 1/2] netdev-dpdk: DPDK v17.11 upgrade

2017-11-22 Thread Stokes, Ian

> >From: Christian Ehrhardt [mailto:christian.ehrha...@canonical.com]
> >Sent: Monday, November 20, 2017 1:04 PM
> >To: Kavanagh, Mark B 
> >Cc:  ;
> >maxime.coque...@redhat.com
> >Subject: Re: [ovs-dev] [RFC PATCH V2 1/2] netdev-dpdk: DPDK v17.11
> >upgrade
> >
> >On Wed, Nov 8, 2017 at 1:00 PM, Mark Kavanagh
> >
> >wrote:
> >> This commit adds support for DPDK v17.11:
> >> - minor updates to accomodate DPDK API changes
> >> - update references to DPDK version in Documentation
> >> - update DPDK version in travis' linux-build script
> >>
> >> Signed-off-by: Mark Kavanagh 
> >
> >
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >> 76e79be..ed5bf62 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -26,6 +26,7 @@
> >>  #include 
> >>  #include 
> >>
> >> +#include 
> >>  #include 
> >>  #include 
> >>  #include 
> >> @@ -150,8 +151,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
> >ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
> >>
> >>  #define DPDK_ETH_PORT_ID_INVALIDRTE_MAX_ETHPORTS
> >>
> >> -/* DPDK library uses uint8_t for port_id. */ -typedef uint8_t
> >> dpdk_port_t;
> >> +/* DPDK library uses uint16_t for port_id. */ typedef uint16_t
> >> +dpdk_port_t;
> >>
> >>  #define VHOST_ENQ_RETRY_NUM 8
> >>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) @@
> >> -2582,7 +2583,7 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int
> >> argc
> >OVS_UNUSED,
> >>  {
> >>  int ret;
> >>  char *response;
> >> -uint8_t port_id;
> >> +dpdk_port_t port_id;
> >>  char devname[RTE_ETH_NAME_MAX_LEN];
> >>  struct netdev_dpdk *dev;

As an FYI the change for the DPDK port ID above has been added to master 
already from a previous patch submitted by Mark as part of the latest pull 
request for OVS DPDK with commit 7ee94cbac874910174bd5e160482a5aaeb64f4e5.

Thanks
Ian

> >
> >Hi Mark,
> >I today independently came to exactly the same changes to get it
> >building until I saw this mail.
> >We discussed the option to handle things depending on the DPDK version
> >built against via config time checks, but then as OVS "defines" a DPDK
> >version that it matches you might not want/need that complexity.
> 
> Hi Christian,
> 
> Thanks for the review/ack.
> 
> Can you provide some additional details regarding the config time checks
> that you mentioned?
> 
> Thanks in advance,
> Mark
> 
> 
> 
> >That said it implies I reviewed and tested (well built) them, so
> >bumping the thread and feel free to add:
> >Reviewed-by: Christian Ehrhardt 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] How to set QoS for VM egress traffic on tunnel mode

2017-11-22 Thread 王志克
Hi All,

I want to set QoS with guide from below link “egress traffic shaping”, but do 
not know how for tunnel mode.
http://docs.openvswitch.org/en/latest/faq/qos/

My scenario:

I have several VM ports, and several VxLan ports in br0, and there is one 
seprate eth0 port (not in br0), which is the underlay port of Vxlan port.
Currently I add rule to match VM traffic to certain Vxlan port, and all these 
vxlan ports would go out through eth0.

Now I want to enable egress traffic shaping, eg:
VM1 goes out with min_rate=10M, max_rate=100M.
VM2 goes out with min_rate=50M, max_rate=200M.

In the given example in http://docs.openvswitch.org/en/latest/faq/qos/ chapter 
“egress traffic shaping”, it directly uses the physical port. But in tunnel 
case, I can NOT specify the physical port directly.
So how to configrue egress traffic shaping in tunnel case? Appreciate some 
example configruation. Thanks.


Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH V2 1/2] netdev-dpdk: DPDK v17.11 upgrade

2017-11-22 Thread Kavanagh, Mark B
>From: Christian Ehrhardt [mailto:christian.ehrha...@canonical.com]
>Sent: Monday, November 20, 2017 1:04 PM
>To: Kavanagh, Mark B 
>Cc:  ; maxime.coque...@redhat.com
>Subject: Re: [ovs-dev] [RFC PATCH V2 1/2] netdev-dpdk: DPDK v17.11 upgrade
>
>On Wed, Nov 8, 2017 at 1:00 PM, Mark Kavanagh 
>wrote:
>> This commit adds support for DPDK v17.11:
>> - minor updates to accomodate DPDK API changes
>> - update references to DPDK version in Documentation
>> - update DPDK version in travis' linux-build script
>>
>> Signed-off-by: Mark Kavanagh 
>
>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 76e79be..ed5bf62 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -26,6 +26,7 @@
>>  #include 
>>  #include 
>>
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -150,8 +151,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
>ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
>>
>>  #define DPDK_ETH_PORT_ID_INVALIDRTE_MAX_ETHPORTS
>>
>> -/* DPDK library uses uint8_t for port_id. */
>> -typedef uint8_t dpdk_port_t;
>> +/* DPDK library uses uint16_t for port_id. */
>> +typedef uint16_t dpdk_port_t;
>>
>>  #define VHOST_ENQ_RETRY_NUM 8
>>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>> @@ -2582,7 +2583,7 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc
>OVS_UNUSED,
>>  {
>>  int ret;
>>  char *response;
>> -uint8_t port_id;
>> +dpdk_port_t port_id;
>>  char devname[RTE_ETH_NAME_MAX_LEN];
>>  struct netdev_dpdk *dev;
>
>Hi Mark,
>I today independently came to exactly the same changes to get it
>building until I saw this mail.
>We discussed the option to handle things depending on the DPDK version
>built against via config time checks, but then as OVS "defines" a DPDK
>version that it matches you might not want/need that complexity.

Hi Christian,

Thanks for the review/ack.

Can you provide some additional details regarding the config time checks that 
you mentioned?

Thanks in advance,
Mark



>That said it implies I reviewed and tested (well built) them, so
>bumping the thread and feel free to add:
>Reviewed-by: Christian Ehrhardt 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v1 2/4] conntrack: Refactor algs.

2017-11-22 Thread Darrell Ball


On 11/20/17, 7:17 AM, "ovs-dev-boun...@openvswitch.org on behalf of Aaron 
Conole"  wrote:

Hi Darrell,

Darrell Ball  writes:

> Upcoming requirements for new algs make it necessary to split out
> alg helper more cleanly.
>
> Signed-off-by: Darrell Ball 
> ---

Thanks for jumping on this so quickly.

I think this and 3/4 should be an independent series.  They don't have
much to do with 1/4 or 4/4, and logically could go in (and be developed)
independently.

That true and I was lazy


>  lib/conntrack.c | 73 
++---
>  1 file changed, 44 insertions(+), 29 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index b8d0e77..7fbcfba 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -67,6 +67,12 @@ enum ct_alg_mode {
>  CT_TFTP_MODE,
>  };
>  
> +enum ct_alg_ctl_type {
> +CT_ALG_CTL_NONE,
> +CT_ALG_CTL_FTP,
> +CT_ALG_CTL_TFTP,
> +};
> +

Any reason to hardcode these?  I think we could have a struct like:

These don’t represent specific L4 port numbers but rather services.

struct ct_alg_helper {
const char *name;
bool (*process_dp_pkt)(struct conn *ct,
   const struct conn *expectation,
   const struct dp_packet *pkt,
   bool nat);
... /* not sure of what else would be needed, yet */;
};


I use an array of function pointers in V2.




Then we can register statically with some table.  I think it will
simplify the way conntrack helpers are added, as well as simplify the
code in the 'framework'/'core' area.

>  static bool conn_key_extract(struct conntrack *, struct dp_packet *,
>   ovs_be16 dl_type, struct conn_lookup_ctx *,
>   uint16_t zone);
> @@ -432,33 +438,47 @@ get_ip_proto(const struct dp_packet *pkt)
>  }
>  
>  static bool
> -is_ftp_ctl(const struct dp_packet *pkt)
> +is_ftp_ctl(const enum ct_alg_ctl_type ct_alg_ctl)
> +{
> +return ct_alg_ctl == CT_ALG_CTL_FTP;
> +}
> +
> +static enum ct_alg_ctl_type
> +get_alg_ctl_type(const struct dp_packet *pkt)
>  {
>  uint8_t ip_proto = get_ip_proto(pkt);
> +struct udp_header *uh = dp_packet_l4(pkt);
>  struct tcp_header *th = dp_packet_l4(pkt);
>  
> -/* CT_IPPORT_FTP is used because IPPORT_FTP in not defined in OSX,
> - * at least in in.h. Since this value will never change, just remove
> +/* CT_IPPORT_FTP/TFTP is used because IPPORT_FTP/TFTP in not defined
> + * in OSX, at least in in.h. Since these values will never change, 
remove
>   * the external dependency. */
> -#define CT_IPPORT_FTP 21
> +enum { CT_IPPORT_FTP = 21 };
> +enum { CT_IPPORT_TFTP = 69 };
>  
> -return (ip_proto == IPPROTO_TCP &&
> -(th->tcp_src == htons(CT_IPPORT_FTP) ||
> - th->tcp_dst == htons(CT_IPPORT_FTP)));
> +if (ip_proto == IPPROTO_UDP && uh->udp_dst == htons(CT_IPPORT_TFTP)) 
{
> +return CT_ALG_CTL_TFTP;
> +} else if (ip_proto == IPPROTO_TCP &&
> +   (th->tcp_src == htons(CT_IPPORT_FTP) ||
> +th->tcp_dst == htons(CT_IPPORT_FTP))) {
> +return CT_ALG_CTL_FTP;
> +}
> +return CT_ALG_CTL_NONE;
>  }
>  
> -static bool
> -is_tftp_ctl(const struct dp_packet *pkt)
> +static void
> +handle_alg_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx,
> +   struct dp_packet *pkt, enum ct_alg_ctl_type ct_alg_ctl,
> +   const struct conn *conn, long long now, bool nat,
> +   const struct conn *conn_for_expectation)
>  {
> - uint8_t ip_proto = get_ip_proto(pkt);
> - struct udp_header *uh = dp_packet_l4(pkt);
> -
> -/* CT_IPPORT_TFTP is used because IPPORT_TFTP in not defined in OSX,
> - * at least in in.h. Since this value will never change, remove
> - * the external dependency. */
> -#define CT_IPPORT_TFTP 69
> -return (ip_proto == IPPROTO_UDP &&
> -uh->udp_dst == htons(CT_IPPORT_TFTP));
> +/* ALG control packet handling with expectation creation. */
> +if (OVS_UNLIKELY((ct_alg_ctl == CT_ALG_CTL_FTP) && conn)) {
> +handle_ftp_ctl(ct, ctx, pkt, conn_for_expectation,
> +   now, CT_FTP_CTL_INTEREST, nat);
> +} else if (OVS_UNLIKELY((ct_alg_ctl == CT_ALG_CTL_TFTP) && conn)) {
> +handle_tftp_ctl(ct, conn_for_expectation, now);
> +}
>  }
>  
>  static void
> @@ -1110,10 +1130,11 @@ process_one(struct conntrack *ct, struc

[ovs-dev] [patch v2 3/3] conntrack: Disable algs by default.

2017-11-22 Thread Darrell Ball
Presently, alg processing is enabled by default to better exercise code.
This is similar to kernels before 4.7 as well.  The recommended default
behavior in the newer kernels is to only process algs if a helper is
supplied in a conntrack rule.  The behavior is changed to match the
later kernels.

Signed-off-by: Darrell Ball 
---
 lib/conntrack.c | 35 +--
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1364d05..fed9f61 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -819,6 +819,26 @@ conn_clean(struct conntrack *ct, struct conn *conn,
 }
 }
 
+static bool
+ct_verify_helper(const char *helper, enum ct_alg_ctl_type ct_alg_ctl)
+{
+if (ct_alg_ctl == CT_ALG_CTL_NONE) {
+return true;
+} else if (helper) {
+if ((ct_alg_ctl == CT_ALG_CTL_FTP) &&
+ !strncmp(helper, "ftp", strlen("ftp"))) {
+return true;
+} else if ((ct_alg_ctl == CT_ALG_CTL_TFTP) &&
+   !strncmp(helper, "tftp", strlen("tftp"))) {
+return true;
+} else {
+return false;
+}
+} else {
+return false;
+}
+}
+
 /* This function is called with the bucket lock held. */
 static struct conn *
 conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
@@ -826,7 +846,8 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
const struct nat_action_info_t *nat_action_info,
struct conn *conn_for_un_nat_copy,
const char *helper,
-   const struct alg_exp_node *alg_exp)
+   const struct alg_exp_node *alg_exp,
+   enum ct_alg_ctl_type ct_alg_ctl)
 {
 unsigned bucket = hash_to_bucket(ctx->hash);
 struct conn *nc = NULL;
@@ -850,14 +871,15 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 return nc;
 }
 
+if (!ct_verify_helper(helper, ct_alg_ctl)) {
+return nc;
+}
+
 nc = new_conn(&ct->buckets[bucket], pkt, &ctx->key, now);
 ctx->conn = nc;
 nc->rev_key = nc->key;
 conn_key_reverse(&nc->rev_key);
-
-if (helper) {
-nc->alg = xstrdup(helper);
-}
+nc->alg = nullable_xstrdup(helper);
 
 if (alg_exp) {
 nc->alg_related = true;
@@ -1222,7 +1244,8 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 ct_rwlock_unlock(&ct->resources_lock);
 
 conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
-  &conn_for_un_nat_copy, helper, alg_exp);
+  &conn_for_un_nat_copy, helper, alg_exp,
+  ct_alg_ctl);
 }
 
 write_ct_md(pkt, zone, conn, &ctx->key, alg_exp);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v2 2/3] conntrack: Allow specified alg port numbers.

2017-11-22 Thread Darrell Ball
Algs can use variable control port numbers for servers.
The main use case is a kind of feeble security measure; the
thinking being by some is that it obscures the alg traffic.
It is really not very effective, but the kernel has this
capability. This patch mimics it.

Signed-off-by: Darrell Ball 
---
 lib/conntrack.c| 41 -
 lib/conntrack.h|  8 
 lib/dpif-netdev.c  |  4 ++--
 tests/test-conntrack.c |  6 +++---
 4 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e397941..1364d05 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -462,23 +462,37 @@ is_ftp_ctl(const enum ct_alg_ctl_type ct_alg_ctl)
 }
 
 static enum ct_alg_ctl_type
-get_alg_ctl_type(const struct dp_packet *pkt)
+get_alg_ctl_type(const struct dp_packet *pkt, ovs_be16 tp_src, ovs_be16 tp_dst,
+ const char *helper)
 {
-uint8_t ip_proto = get_ip_proto(pkt);
-struct udp_header *uh = dp_packet_l4(pkt);
-struct tcp_header *th = dp_packet_l4(pkt);
-
 /* CT_IPPORT_FTP/TFTP is used because IPPORT_FTP/TFTP in not defined
  * in OSX, at least in in.h. Since these values will never change, remove
  * the external dependency. */
 enum { CT_IPPORT_FTP = 21 };
 enum { CT_IPPORT_TFTP = 69 };
+uint8_t ip_proto = get_ip_proto(pkt);
+struct udp_header *uh = dp_packet_l4(pkt);
+struct tcp_header *th = dp_packet_l4(pkt);
+ovs_be16 ftp_src_port = htons(CT_IPPORT_FTP);
+ovs_be16 ftp_dst_port = htons(CT_IPPORT_FTP);
+ovs_be16 tftp_dst_port = htons(CT_IPPORT_TFTP);
+
+if (OVS_UNLIKELY(tp_dst)) {
+if (helper && !strncmp(helper, "ftp", strlen("ftp"))) {
+ftp_dst_port = tp_dst;
+} else if (helper && !strncmp(helper, "tftp", strlen("tftp"))) {
+tftp_dst_port = tp_dst;
+}
+} else if (OVS_UNLIKELY(tp_src)) {
+if (helper && !strncmp(helper, "ftp", strlen("ftp"))) {
+ftp_src_port = tp_src;
+}
+}
 
-if (ip_proto == IPPROTO_UDP && uh->udp_dst == htons(CT_IPPORT_TFTP)) {
+if (ip_proto == IPPROTO_UDP && uh->udp_dst == tftp_dst_port) {
 return CT_ALG_CTL_TFTP;
 } else if (ip_proto == IPPROTO_TCP &&
-   (th->tcp_src == htons(CT_IPPORT_FTP) ||
-th->tcp_dst == htons(CT_IPPORT_FTP))) {
+   (th->tcp_src == ftp_src_port || th->tcp_dst == ftp_dst_port)) {
 return CT_ALG_CTL_FTP;
 }
 return CT_ALG_CTL_NONE;
@@ -1107,7 +1121,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 bool force, bool commit, long long now, const uint32_t *setmark,
 const struct ovs_key_ct_labels *setlabel,
 const struct nat_action_info_t *nat_action_info,
-const char *helper)
+ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper)
 {
 struct conn *conn;
 unsigned bucket = hash_to_bucket(ctx->hash);
@@ -1154,7 +1168,8 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 struct conn conn_for_un_nat_copy;
 conn_for_un_nat_copy.conn_type = CT_CONN_TYPE_DEFAULT;
 
-enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt);
+enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt, tp_src, tp_dst,
+   helper);
 
 if (OVS_LIKELY(conn)) {
 if (is_ftp_ctl(ct_alg_ctl)) {
@@ -1248,7 +1263,7 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
   ovs_be16 dl_type, bool force, bool commit, uint16_t zone,
   const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
-  const char *helper,
+  ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper,
   const struct nat_action_info_t *nat_action_info,
   long long now)
 {
@@ -1262,8 +1277,8 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 write_ct_md(packet, zone, NULL, NULL, NULL);
 continue;
 }
-process_one(ct, packet, &ctx, zone, force, commit,
-now, setmark, setlabel, nat_action_info, helper);
+process_one(ct, packet, &ctx, zone, force, commit, now, setmark,
+setlabel, nat_action_info, tp_src, tp_dst, helper);
 }
 
 return 0;
diff --git a/lib/conntrack.h b/lib/conntrack.h
index fbeef1c..990f6c2 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -90,11 +90,11 @@ struct nat_action_info_t {
 void conntrack_init(struct conntrack *);
 void conntrack_destroy(struct conntrack *);
 
-int conntrack_execute(struct conntrack *, struct dp_packet_batch *,
-  ovs_be16 dl_type, bool force, bool commit,
-  uint16_t zone, const uint32_t *setmark,
+int conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
+  ovs_be16 dl_type, bool fo

[ovs-dev] [patch v2 1/3] conntrack: Refactor algs.

2017-11-22 Thread Darrell Ball
Upcoming requirements for new algs make it desirable to split out
alg helper more cleanly.

Signed-off-by: Darrell Ball 
---
 lib/conntrack.c | 108 
 1 file changed, 69 insertions(+), 39 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index f5a3aa9..e397941 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -67,6 +67,12 @@ enum ct_alg_mode {
 CT_TFTP_MODE,
 };
 
+enum ct_alg_ctl_type {
+CT_ALG_CTL_NONE,
+CT_ALG_CTL_FTP,
+CT_ALG_CTL_TFTP,
+};
+
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
  ovs_be16 dl_type, struct conn_lookup_ctx *,
  uint16_t zone);
@@ -142,6 +148,13 @@ static enum ftp_ctl_pkt
 detect_ftp_ctl_type(const struct conn_lookup_ctx *ctx,
 struct dp_packet *pkt);
 
+static struct ct_l4_proto *l4_protos[] = {
+[IPPROTO_TCP] = &ct_proto_tcp,
+[IPPROTO_UDP] = &ct_proto_other,
+[IPPROTO_ICMP] = &ct_proto_icmp4,
+[IPPROTO_ICMPV6] = &ct_proto_icmp6,
+};
+
 static void
 handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx,
struct dp_packet *pkt,
@@ -150,14 +163,23 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 
 static void
 handle_tftp_ctl(struct conntrack *ct,
+const struct conn_lookup_ctx *ctx OVS_UNUSED,
+struct dp_packet *pkt OVS_UNUSED,
 const struct conn *conn_for_expectation,
-long long now);
-
-static struct ct_l4_proto *l4_protos[] = {
-[IPPROTO_TCP] = &ct_proto_tcp,
-[IPPROTO_UDP] = &ct_proto_other,
-[IPPROTO_ICMP] = &ct_proto_icmp4,
-[IPPROTO_ICMPV6] = &ct_proto_icmp6,
+long long now, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED,
+bool nat OVS_UNUSED);
+
+typedef void (*alg_helper)(struct conntrack *ct,
+   const struct conn_lookup_ctx *ctx,
+   struct dp_packet *pkt,
+   const struct conn *conn_for_expectation,
+   long long now, enum ftp_ctl_pkt ftp_ctl,
+   bool nat);
+
+static alg_helper alg_helpers[] = {
+[CT_ALG_CTL_NONE] = NULL,
+[CT_ALG_CTL_FTP] = handle_ftp_ctl,
+[CT_ALG_CTL_TFTP] = handle_tftp_ctl,
 };
 
 long long ct_timeout_val[] = {
@@ -434,35 +456,45 @@ get_ip_proto(const struct dp_packet *pkt)
 }
 
 static bool
-is_ftp_ctl(const struct dp_packet *pkt)
+is_ftp_ctl(const enum ct_alg_ctl_type ct_alg_ctl)
 {
-uint8_t ip_proto = get_ip_proto(pkt);
-struct tcp_header *th = dp_packet_l4(pkt);
-
-/* CT_IPPORT_FTP is used because IPPORT_FTP in not defined in OSX,
- * at least in in.h. Since this value will never change, just remove
- * the external dependency. */
-#define CT_IPPORT_FTP 21
-
-return (ip_proto == IPPROTO_TCP &&
-(th->tcp_src == htons(CT_IPPORT_FTP) ||
- th->tcp_dst == htons(CT_IPPORT_FTP)));
-
+return ct_alg_ctl == CT_ALG_CTL_FTP;
 }
 
-static bool
-is_tftp_ctl(const struct dp_packet *pkt)
+static enum ct_alg_ctl_type
+get_alg_ctl_type(const struct dp_packet *pkt)
 {
 uint8_t ip_proto = get_ip_proto(pkt);
 struct udp_header *uh = dp_packet_l4(pkt);
+struct tcp_header *th = dp_packet_l4(pkt);
 
-/* CT_IPPORT_TFTP is used because IPPORT_TFTP in not defined in OSX,
- * at least in in.h. Since this value will never change, remove
+/* CT_IPPORT_FTP/TFTP is used because IPPORT_FTP/TFTP in not defined
+ * in OSX, at least in in.h. Since these values will never change, remove
  * the external dependency. */
-#define CT_IPPORT_TFTP 69
-return (ip_proto == IPPROTO_UDP &&
-uh->udp_dst == htons(CT_IPPORT_TFTP));
+enum { CT_IPPORT_FTP = 21 };
+enum { CT_IPPORT_TFTP = 69 };
+
+if (ip_proto == IPPROTO_UDP && uh->udp_dst == htons(CT_IPPORT_TFTP)) {
+return CT_ALG_CTL_TFTP;
+} else if (ip_proto == IPPROTO_TCP &&
+   (th->tcp_src == htons(CT_IPPORT_FTP) ||
+th->tcp_dst == htons(CT_IPPORT_FTP))) {
+return CT_ALG_CTL_FTP;
+}
+return CT_ALG_CTL_NONE;
+}
 
+static void
+handle_alg_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx,
+   struct dp_packet *pkt, enum ct_alg_ctl_type ct_alg_ctl,
+   const struct conn *conn, long long now, bool nat,
+   const struct conn *conn_for_expectation)
+{
+/* ALG control packet handling with expectation creation. */
+if (OVS_UNLIKELY(alg_helpers[ct_alg_ctl] && conn)) {
+alg_helpers[ct_alg_ctl](ct, ctx, pkt, conn_for_expectation, now,
+CT_FTP_CTL_INTEREST, nat);
+}
 }
 
 static void
@@ -1121,10 +1153,11 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 bool create_new_conn = false;
 struct conn conn_for_un_nat_copy;
 conn_for_un_nat_copy.conn_type = CT_CONN_TYPE_DEFAULT;
-bool

[ovs-dev] [patch v2 0/3] conntrack: Alg improvements.

2017-11-22 Thread Darrell Ball
Some refactoring of alg support is done.
Also algs are disabled by default unless an alg specifier
is supplied; this allows for enhanced security and matches
later kernels.
Another change to allow for non-standard alg conntrol
port specification.
 

Darrell Ball (3):
  conntrack: Refactor algs.
  conntrack: Allow specified alg port numbers.
  conntrack: Disable algs by default.

 lib/conntrack.c| 168 ++---
 lib/conntrack.h|   8 +--
 lib/dpif-netdev.c  |   4 +-
 tests/test-conntrack.c |   6 +-
 4 files changed, 127 insertions(+), 59 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev