Re: [ovs-dev] [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.

2019-11-11 Thread Nitin Katiyar via dev
Hi Guys,
Can you please review the patch? It has been in mailing list for long time.

Regards,
Nitin

> -Original Message-
> From: Nitin Katiyar
> Sent: Friday, October 04, 2019 11:06 AM
> To: 'ovs-dev@openvswitch.org' 
> Cc: Anju Thomas ; 'Ilya Maximets'
> ; Ben Pfaff 
> Subject: RE: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval
> in PMD main loop.
> 
> Hi,
> Can you please review the following patch and provide the feedback?
> 
> Regards,
> Nitin
> 
> > -Original Message-
> > From: Nitin Katiyar
> > Sent: Wednesday, August 28, 2019 2:00 PM
> > To: ovs-dev@openvswitch.org
> > Cc: Anju Thomas ; Ilya Maximets
> > 
> > Subject: RE: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed
> > interval in PMD main loop.
> >
> > Hi,
> > Please provide your feedback.
> >
> > Regards,
> > Nitin
> >
> > > -Original Message-
> > > From: Nitin Katiyar
> > > Sent: Thursday, August 22, 2019 10:24 PM
> > > To: ovs-dev@openvswitch.org
> > > Cc: Nitin Katiyar ; Anju Thomas
> > > 
> > > Subject: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed
> > > interval in PMD main loop.
> > >
> > > Each PMD updates the global sequence number for RCU synchronization
> > > purpose with other OVS threads. This is done at every 1025th
> > > iteration in PMD main loop.
> > >
> > > If the PMD thread is responsible for polling large number of queues
> > > that are carrying traffic, it spends a lot of time processing
> > > packets and this results in significant delay in performing the
> > > housekeeping
> > activities.
> > >
> > > If the OVS main thread is waiting to synchronize with the PMD
> > > threads and if those threads delay performing housekeeping
> > > activities for more than 3 sec then LACP processing will be impacted
> > > and it will lead to LACP flaps. Similarly, other controls protocols
> > > run by OVS main thread are
> > impacted.
> > >
> > > For e.g. a PMD thread polling 200 ports/queues with average of 1600
> > > processing cycles per packet with batch size of 32 may take 1024
> > > (200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU
> > > it means more than 5 ms per iteration. So, for 1024 iterations to
> > > complete it would be more than 5 seconds.
> > >
> > > This gets worse when there are PMD threads which are less loaded.
> > > It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
> > > by heavily loaded PMD and next attempt to quiesce would be after
> > > 1024
> > iterations.
> > >
> > > With this patch, PMD RCU synchronization will be performed after
> > > fixed interval instead after a fixed number of iterations. This will
> > > ensure that even if the packet processing load is high the RCU
> > > synchronization will not be delayed long.
> > >
> > > Co-authored-by: Anju Thomas 
> > > Signed-off-by: Anju Thomas 
> > > Signed-off-by: Nitin Katiyar 
> > > ---
> > >  lib/dpif-netdev.c | 22 ++
> > >  1 file changed, 22 insertions(+)
> > >
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > > d0a1c58..63b6cb9
> > > 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -228,6 +228,9 @@ struct dfc_cache {
> > >   * and used during rxq to pmd assignment. */  #define
> > > PMD_RXQ_INTERVAL_MAX 6
> > >
> > > +/* Time in microseconds to try RCU quiescing. */ #define
> > > +PMD_RCU_QUIESCE_INTERVAL 1LL
> > > +
> > >  struct dpcls {
> > >  struct cmap_node node;  /* Within
> dp_netdev_pmd_thread.classifiers
> > > */
> > >  odp_port_t in_port;
> > > @@ -751,6 +754,9 @@ struct dp_netdev_pmd_thread {
> > >
> > >  /* Set to true if the pmd thread needs to be reloaded. */
> > >  bool need_reload;
> > > +
> > > +/* Next time when PMD should try RCU quiescing. */
> > > +long long next_rcu_quiesce;
> > >  };
> > >
> > >  /* Interface to netdev-based datapath. */ @@ -5486,6 +5492,8 @@
> > reload:
> > >  pmd->intrvl_tsc_prev = 0;
> > >  atomic_store_relaxed(&pmd->intrvl_cycles, 0);
> > >  cycles_counter_update(s);
> > > +
> > > +pmd->next_rcu_quiesce = pmd->ctx.now +
> > > PMD_RCU_QUIESCE_INTERVAL;
> > >  /* Protect pmd stats from external clearing while polling. */
> > >  ovs_mutex_lock(&pmd->perf_stats.stats_mutex);
> > >  for (;;) {
> > > @@ -5493,6 +5501,17 @@ reload:
> > >
> > >  pmd_perf_start_iteration(s);
> > >
> > > +/* Do RCU synchronization at fixed interval instead of doing it
> > > + * at fixed number of iterations. This ensures that 
> > > synchronization
> > > + * would not be delayed long even at high load of packet
> > > + * processing. */
> > > +if (pmd->ctx.now > pmd->next_rcu_quiesce) {
> > > +if (!ovsrcu_try_quiesce()) {
> > > +pmd->next_rcu_quiesce =
> > > +pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
> > > +}
> > > +}
> > > +
> > >  for (i = 0; i < poll_cnt; i++) {
> > >
> > >  if (!poll

Re: [ovs-dev] [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

2019-11-11 Thread Sriram Vatala via dev
Thanks Ilya and kevin.

Regards,
Sriram.

-Original Message-
From: Ilya Maximets 
Sent: 12 November 2019 00:08
To: Ilya Maximets ; Kevin Traynor ; 
Sriram Vatala ; ovs-dev@openvswitch.org
Subject: Re: [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

On 11.11.2019 17:11, Ilya Maximets wrote:
>>> I'm not sure if clang annotations will work with rte_spinlock.
>>> DPDK doesn't have proper annotations for locking functions.
>>>
>>
>> Ah, good point, I didn't check the lock type. In that case nevermind,
>> patch+incremental LGTM as is.
>>
>> Acked-by: Kevin Traynor 
>
> Thanks. In this case, I think, there is no need to re-spin the series.
> I'll just squash the incremental with this patch and give it another try.
> If it'll work fine, I'll apply the series.

Thanks Sriram and Kevin! Series applied to master.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: add hash info to upcall

2019-11-11 Thread Tonghao Zhang



> On Nov 12, 2019, at 12:35 PM, Pravin Shelar  wrote:
> 
>> On Sun, Nov 10, 2019 at 3:44 AM  wrote:
>> 
>> From: Tonghao Zhang 
>> 
>> When using the kernel datapath, the upcall don't
>> add skb hash info relatived. That will introduce
>> some problem, because the hash of skb is very
>> important (e.g. vxlan module uses it for udp src port,
>> tx queue selection on tx path.).
>> 
>> For example, there will be one upcall, without information
>> skb hash, to ovs-vswitchd, for the first packet of one tcp
>> session. When kernel sents the tcp packets, the hash is
>> random for a tcp socket:
>> 
>> tcp_v4_connect
>>  -> sk_set_txhash (is random)
>> 
>> __tcp_transmit_skb
>>  -> skb_set_hash_from_sk
>> 
>> Then the udp src port of first tcp packet is different
>> from rest packets. The topo is shown.
>> 
>> $ ovs-vsctl add-br br-int
>> $ ovs-vsctl add-port br-int vxl0 -- \
>>set Interface vxl0 type=vxlan options:key=100 
>> options:remote_ip=1.1.1.200
>> 
>> $ __tap is internal type on host
>> $ or tap net device for VM/Dockers
>> $ ovs-vsctl add-port br-int __tap
>> 
>> +---+  +-+
>> |   Docker/VMs  |  | ovs-vswitchd|
>> ++--+  +-+
>> |   ^|
>> |   ||
>> |   |  upcallv recalculate packet hash
>> | +-++--+
>> |  tap netdev | |   vxlan modules
>> +---> +-->  Open vSwitch ko   --+--->
>>   internal type   | |
>>   +-+
>> 
>> Reported-at: 
>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
>> Signed-off-by: Tonghao Zhang 
>> ---
>> include/uapi/linux/openvswitch.h |  2 ++
>> net/openvswitch/datapath.c   | 31 ++-
>> net/openvswitch/datapath.h   |  3 +++
>> 3 files changed, 35 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/uapi/linux/openvswitch.h 
>> b/include/uapi/linux/openvswitch.h
>> index 1887a451c388..1c58e019438e 100644
>> --- a/include/uapi/linux/openvswitch.h
>> +++ b/include/uapi/linux/openvswitch.h
>> @@ -170,6 +170,7 @@ enum ovs_packet_cmd {
>>  * output port is actually a tunnel port. Contains the output tunnel key
>>  * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
>>  * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
>> + * @OVS_PACKET_ATTR_HASH: Packet hash info (e.g. hash, sw_hash and l4_hash 
>> in skb)
>>  * @OVS_PACKET_ATTR_LEN: Packet size before truncation.
>>  * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
>>  * size.
>> @@ -190,6 +191,7 @@ enum ovs_packet_attr {
>>OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
>>   error logging should be suppressed. */
>>OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
>> +   OVS_PACKET_ATTR_HASH,   /* Packet hash. */
>>OVS_PACKET_ATTR_LEN,/* Packet size before truncation. */
>>__OVS_PACKET_ATTR_MAX
>> };
>> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>> index 2088619c03f0..f938c43e3085 100644
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -350,7 +350,8 @@ static size_t upcall_msg_size(const struct 
>> dp_upcall_info *upcall_info,
>>size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
>>+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
>>+ nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY 
>> */
>> -   + nla_total_size(sizeof(unsigned int)); /* 
>> OVS_PACKET_ATTR_LEN */
>> +   + nla_total_size(sizeof(unsigned int)) /* 
>> OVS_PACKET_ATTR_LEN */
>> +   + nla_total_size(sizeof(u64)); /* OVS_PACKET_ATTR_HASH */
>> 
>>/* OVS_PACKET_ATTR_USERDATA */
>>if (upcall_info->userdata)
>> @@ -393,6 +394,7 @@ static int queue_userspace_packet(struct datapath *dp, 
>> struct sk_buff *skb,
>>size_t len;
>>unsigned int hlen;
>>int err, dp_ifindex;
>> +   u64 hash;
>> 
>>dp_ifindex = get_dpifindex(dp);
>>if (!dp_ifindex)
>> @@ -504,6 +506,24 @@ static int queue_userspace_packet(struct datapath *dp, 
>> struct sk_buff *skb,
>>pad_packet(dp, user_skb);
>>}
>> 
>> +   if (skb_get_hash_raw(skb)) {
> skb_get_hash_raw() never fails to return hash, so I do not see point
> of checking hash value.
If hash value is 0, we don't add hash info to upcall.
> 
>> +   hash = skb_get_hash_raw(skb);
>> +
>> +   if (skb->sw_hash)
>> +   hash |= OVS_PACKET_HASH_SW;
>> +
>> +   if (skb->l4_hash)
>> +   hash |

Re: [ovs-dev] [PATCH net-next] net: openvswitch: add hash info to upcall

2019-11-11 Thread Pravin Shelar
On Sun, Nov 10, 2019 at 3:44 AM  wrote:
>
> From: Tonghao Zhang 
>
> When using the kernel datapath, the upcall don't
> add skb hash info relatived. That will introduce
> some problem, because the hash of skb is very
> important (e.g. vxlan module uses it for udp src port,
> tx queue selection on tx path.).
>
> For example, there will be one upcall, without information
> skb hash, to ovs-vswitchd, for the first packet of one tcp
> session. When kernel sents the tcp packets, the hash is
> random for a tcp socket:
>
> tcp_v4_connect
>   -> sk_set_txhash (is random)
>
> __tcp_transmit_skb
>   -> skb_set_hash_from_sk
>
> Then the udp src port of first tcp packet is different
> from rest packets. The topo is shown.
>
> $ ovs-vsctl add-br br-int
> $ ovs-vsctl add-port br-int vxl0 -- \
> set Interface vxl0 type=vxlan options:key=100 
> options:remote_ip=1.1.1.200
>
> $ __tap is internal type on host
> $ or tap net device for VM/Dockers
> $ ovs-vsctl add-port br-int __tap
>
> +---+  +-+
> |   Docker/VMs  |  | ovs-vswitchd|
> ++--+  +-+
>  |   ^|
>  |   ||
>  |   |  upcallv recalculate packet hash
>  | +-++--+
>  |  tap netdev | |   vxlan modules
>  +---> +-->  Open vSwitch ko   --+--->
>internal type   | |
>+-+
>
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
> Signed-off-by: Tonghao Zhang 
> ---
>  include/uapi/linux/openvswitch.h |  2 ++
>  net/openvswitch/datapath.c   | 31 ++-
>  net/openvswitch/datapath.h   |  3 +++
>  3 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 1887a451c388..1c58e019438e 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -170,6 +170,7 @@ enum ovs_packet_cmd {
>   * output port is actually a tunnel port. Contains the output tunnel key
>   * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
>   * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
> + * @OVS_PACKET_ATTR_HASH: Packet hash info (e.g. hash, sw_hash and l4_hash 
> in skb)
>   * @OVS_PACKET_ATTR_LEN: Packet size before truncation.
>   * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
>   * size.
> @@ -190,6 +191,7 @@ enum ovs_packet_attr {
> OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
>error logging should be suppressed. */
> OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
> +   OVS_PACKET_ATTR_HASH,   /* Packet hash. */
> OVS_PACKET_ATTR_LEN,/* Packet size before truncation. */
> __OVS_PACKET_ATTR_MAX
>  };
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 2088619c03f0..f938c43e3085 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -350,7 +350,8 @@ static size_t upcall_msg_size(const struct dp_upcall_info 
> *upcall_info,
> size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
> + nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
> + nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY 
> */
> -   + nla_total_size(sizeof(unsigned int)); /* 
> OVS_PACKET_ATTR_LEN */
> +   + nla_total_size(sizeof(unsigned int)) /* OVS_PACKET_ATTR_LEN 
> */
> +   + nla_total_size(sizeof(u64)); /* OVS_PACKET_ATTR_HASH */
>
> /* OVS_PACKET_ATTR_USERDATA */
> if (upcall_info->userdata)
> @@ -393,6 +394,7 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> size_t len;
> unsigned int hlen;
> int err, dp_ifindex;
> +   u64 hash;
>
> dp_ifindex = get_dpifindex(dp);
> if (!dp_ifindex)
> @@ -504,6 +506,24 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> pad_packet(dp, user_skb);
> }
>
> +   if (skb_get_hash_raw(skb)) {
skb_get_hash_raw() never fails to return hash, so I do not see point
of checking hash value.

> +   hash = skb_get_hash_raw(skb);
> +
> +   if (skb->sw_hash)
> +   hash |= OVS_PACKET_HASH_SW;
> +
> +   if (skb->l4_hash)
> +   hash |= OVS_PACKET_HASH_L4;
> +
> +   if (nla_put(user_skb, OVS_PACKET_ATTR_HASH,
> +   sizeof (u64), &hash)) {
> +   err = -ENOBUFS;
> +   got

Re: [ovs-dev] [PATCH ovn v6] ovn-northd: Limit ARP/ND broadcast domain whenever possible.

2019-11-11 Thread Han Zhou
On Mon, Nov 11, 2019 at 11:03 AM Dumitru Ceara  wrote:
>
> On Mon, Nov 11, 2019 at 6:11 PM Han Zhou  wrote:
> >
> > 1. Would there be problem for VLAN backed logical router use case,
since a chassis MAC is used as src MAC to send packets from router ports?
> > 2. How about checking if tpa == spa to make sure GARP is always
flooded? (not directly supported by OF)
>
> This would've been nice to have in OF :)
>
> >
> >
> > On Mon, Nov 11, 2019 at 5:32 AM Dumitru Ceara  wrote:
> > >
> > > On Sat, Nov 9, 2019 at 8:35 AM Han Zhou  wrote:
> > > >
> > > >
> > > >
> > > > On Fri, Nov 8, 2019 at 6:38 AM Dumitru Ceara 
wrote:
> > > > >
> > > > > ARP request and ND NS packets for router owned IPs were being
> > > > > flooded in the complete L2 domain (using the MC_FLOOD multicast
group).
> > > > > However this creates a scaling issue in scenarios where
aggregation
> > > > > logical switches are connected to more logical routers (~350). The
> > > > > logical pipelines of all routers would have to be executed before
the
> > > > > packet is finally replied to by a single router, the owner of the
IP
> > > > > address.
> > > > >
> > > > > This commit limits the broadcast domain by bypassing the L2
Lookup stage
> > > > > for ARP requests that will be replied by a single router. The
packets
> > > > > are forwarded only to the router port that owns the target IP
address.
> > > > >
> > > > > IPs that are owned by the routers and for which this fix applies
are:
> > > > > - IP addresses configured on the router ports.
> > > > > - VIPs.
> > > > > - NAT IPs.
> > > > >
> > > > > This commit also fixes function get_router_load_balancer_ips()
which
> > > > > was incorrectly returning a single address_family even though the
> > > > > IP set could contain a mix of IPv4 and IPv6 addresses.
> > > > >
> > > > > Reported-at: https://bugzilla.redhat.com/1756945
> > > > > Reported-by: Anil Venkata 
> > > > > Signed-off-by: Dumitru Ceara 
> > > > >
> > > > > ---
> > > > > v6:
> > > > > - Address Han's comments:
> > > > >   - remove flooding of ARPs targeting OVN owned IP addresses.
> > > > >   - update ovn-architecture documentation.
> > > > >   - rename ARP handling functions.
> > > > >   - Adapt "ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR" autotest to
take into
> > > > > account the new way of forwarding ARPs.
> > > > > - Also, properly deal with ARP packets on VLAN-backed networks.
> > > >
> > > > I am confused by this additional VLAN related change. VLAN is just
handled as bridged logical switch where a localnet port is used as
*inport*. It seems to me no difference from regular localnet port no matter
with or without VLAN tag. When an ARP request is going through the ingress
pipeline of the bridged logical switch, the logic of bypassing the flooding
added by this patch should just apply, right? It is also very common
scenario that the *aggregation switch* for the routers is an external
physical network backed by VLAN. I think the purpose of this patch is to
make sure scenario scale. Did I misunderstand anything here?
> > >
> > > Hi Han,
> > >
> > > The reason behind it was that with VLAN backed networks when packets
> > > move between hypervisors without going through geneve we rerun the
> > > ingress pipeline. That would mean that the flows I added for self
> > > originated (G)ARP packets won't be hit for ARP requests originated by
> > > ovn-controller on a remote hypervisor:
> > >
> > > priority=80 match: "inport=, arp.op == 1" action:
> > > "outport=MC_FLOOD"
> > >
> > > But instead we would hit:
> > > priority=75 match: "arp.op == 1 && arp.tpa ==  > > "outport=" and never send flood the packet out on
> > > the second HV.
> > >
> >
> > Thanks for the explain. Now I understand the problem.
> >
> > > You're right, the way I added the check for all VLAN packets is not OK
> > > as we fall back to the old behavior too often. However, I'm not sure
> > > what the best option is. Do you think it's fine if I change the
> > > priority 80 flow to the following?
> > >
> > > priority=80 match: "eth.src={lrp.ea, lr.nat.external_macs} && arp.op
> > > == 1" action: "outport=MC_FLOOD"
> > >
> > > The idea would be to identify self originated ARP requests by matching
> > > the source mac instead of logical ingress port (as this might not be
> > > present). I tried it locally and it works fine but we do need to add
> > > more flows than before.
> > >
> >
> > Would this work when "ovn-chassis-mac-mappings" is configured for VLAN
backed logical routers? We might end up adding chassis unique MACs, too?
>
> I think it will work fine because when ovn-chassis-mac-mappings is
> configured we add an entry in table OFTABLE_PHY_TO_LOG to match on
> CHASSIS_MAC_TO_ROUTER_MAC_CONJID (comparing eth.src to all
> ovn-chassis-macs) to rewrite the eth.src address with that of the
> router port.
>
> >
> > Alternatively, I think we may change the priority 80 flow to match for
each OVN router owned IP: arp.tpa == IP && arp.spa == IP ... Would this
ensure OVN router generate

[ovs-dev] Etapas a seguir para el mapeo de procesos

2019-11-11 Thread Análisis y Mejora de procesos
21 de Noviembre | Horario de 10:00 a 17:00 hrs.  |  (hora del centro de México) 

- Mapeo de Procesos - Webinar en Vivo


 Nuestro webinar, te ofrece aplicar una metodología práctica y ordenada para el 
mapeo de
procesos que puede ser aplicable a cualquier tipo de organización. Este taller 
está dirigido al
personal de las áreas de calidad, procesos, proyectos de mejora y profesionales 
dedicados a la
documentación de procesos en general y les permitirá conocer los pasos a seguir 
para el mapeo
de los procesos de su organización ya sea para una documentación posterior de 
procedimientos
o para la implementación de un sistema de gestión basado en el mapa de procesos.
Daremos a conocer los pasos a seguir para la documentación de los procesos
de su organización. Identificar los procesos que integran el sistema general de 
operación del
negocio y permitir desglosar procedimientos.


Solicita información respondiendo a este correo con la palabra Procesos, junto 
con los siguientes datos:

Nombre:
Correo electrónico:
Número telefónico:

Números de Atención:  (045) 55 15 54 66 30 - (045) 55 85567293 - (045) 
5530167085 

En caso de que haya recibido este correo sin haberlo solicitado o si desea 
dejar de recibir nuestra promoción favor de responder con la palabra baja o 
enviar un correo a bajas@ innovalearn.net


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: add hash info to upcall

2019-11-11 Thread David Miller
From: xiangxia.m@gmail.com
Date: Sun, 10 Nov 2019 19:54:04 +0800

> From: Tonghao Zhang 
> 
> When using the kernel datapath, the upcall don't
> add skb hash info relatived. That will introduce
> some problem, because the hash of skb is very
> important (e.g. vxlan module uses it for udp src port,
> tx queue selection on tx path.).
> 
> For example, there will be one upcall, without information
> skb hash, to ovs-vswitchd, for the first packet of one tcp
> session. When kernel sents the tcp packets, the hash is
> random for a tcp socket:
 ...
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
> Signed-off-by: Tonghao Zhang 
> ---
> v2: add define before #endif

OVS folks, please review.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [dpdk-latest PATCH v3 1/2] netdev-dpdk: Add support for multi-queue QoS to the DPDK datapath

2019-11-11 Thread Stokes, Ian




On 10/1/2019 3:10 PM, Eelco Chaudron wrote:

This patch adds support for multi-queue QoS to the DPDK datapath. Most of
the code is based on an earlier patch from a patchset sent out by
zhaozhanxu. The patch was titled "[ovs-dev, v2, 1/4] netdev-dpdk.c: Support
the multi-queue QoS configuration for dpdk datapath"


Thanks for working on this Eelco, a few comments below.


Signed-off-by: Eelco Chaudron 

Should there be a co-author for zhaozhanxu also on this patch?


---
  lib/netdev-dpdk.c |  216 -
  1 file changed, 210 insertions(+), 6 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ba92e89..072ce96 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -197,6 +197,13 @@ struct qos_conf {
  rte_spinlock_t lock;
  };
  
+/* QoS queue information used by the netdev queue dump functions*/


Minor, end comment above with period and space before comment termination.


+struct netdev_dpdk_queue_state {
+uint32_t *queues;
+size_t cur_queue;
+size_t n_queues;
+};
+
  /* A particular implementation of dpdk QoS operations.
   *
   * The functions below return 0 if successful or a positive errno value on
@@ -263,6 +270,41 @@ struct dpdk_qos_ops {
   */
  int (*qos_run)(struct qos_conf *qos_conf, struct rte_mbuf **pkts,
 int pkt_cnt, bool should_steal);
+
+/* Called to construct a QoS Queue. The implementation should make
+ * the appropriate calls to configure QoS Queue according to 'details'.
+ *
+ * The contents of 'details' should be documented as valid for 'ovs_name'
+ * in the "other_config" column in the "QoS" table in vswitchd/vswitch.xml
+ * (which is built as ovs-vswitchd.conf.db(8)).
+ *
+ * This function must return 0 if and only if it constructs
+ * qos queue successfully.

Should qos above be QoS?


+ */
+int (*qos_queue_construct)(const struct smap *details,
+   uint32_t queue_id, struct qos_conf *conf);
+
+/* Destroys the Qos Queue */

Minor, missing period.


+void (*qos_queue_destruct)(struct qos_conf *conf, uint32_t queue_id);
+
+/* Retrieves details of QoS Queue configuration into 'details'.
+ *
+ * The contents of 'details' should be documented as valid for 'ovs_name'
+ * in the "other_config" column in the "QoS" table in vswitchd/vswitch.xml
+ * (which is built as ovs-vswitchd.conf.db(8)).
+ */
+int (*qos_queue_get)(struct smap *details, uint32_t queue_id,
+ const struct qos_conf *conf);
+
+/* Retrieves statistics of QoS Queue configuration into 'stats'. */
+int (*qos_queue_get_stats)(const struct qos_conf *conf, uint32_t queue_id,
+   struct netdev_queue_stats *stats);
+
+/* Setup the 'netdev_dpdk_queue_state' structure used by the dpdk queue
+ * dump functions.
+ */
+int (*qos_queue_dump_state_init)(const struct qos_conf *conf,
+ struct netdev_dpdk_queue_state *state);
  };
  
  /* dpdk_qos_ops for each type of user space QoS implementation */

@@ -4032,6 +4074,161 @@ netdev_dpdk_set_qos(struct netdev *netdev, const char 
*type,
  return error;
  }
  
+static int

+netdev_dpdk_get_queue(const struct netdev *netdev, uint32_t queue_id,
+  struct smap *details)
+{
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+struct qos_conf *qos_conf;
+int error = 0;
+
+ovs_mutex_lock(&dev->mutex);
+
+qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf);
+if (!qos_conf || !qos_conf->ops || !qos_conf->ops->qos_queue_get) {
+error = EOPNOTSUPP;
+} else {
+error = qos_conf->ops->qos_queue_get(details, queue_id, qos_conf);
+}
+
+ovs_mutex_unlock(&dev->mutex);
+
+return error;
+}
+
+static int
+netdev_dpdk_set_queue(struct netdev *netdev, uint32_t queue_id,
+  const struct smap *details)
+{
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+struct qos_conf *qos_conf;
+int error = 0;
+
+ovs_mutex_lock(&dev->mutex);
+
+qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf);
+if (!qos_conf || !qos_conf->ops || !qos_conf->ops->qos_queue_construct) {
+error = EOPNOTSUPP;
+} else {
+error = qos_conf->ops->qos_queue_construct(details, queue_id,
+   qos_conf);
+}
+
+if (error && error != EOPNOTSUPP) {
+VLOG_ERR("Failed to set QoS queue %d on port %s: %s",
+ queue_id, netdev->name, rte_strerror(error));


Can we use 'netdev_get_name(&dev->up)' instead of 'netdev->name'. It's 
just to keep it in the same style as other VLOG_ERR messages reference 
device names.



+}
+
+ovs_mutex_unlock(&dev->mutex);
+
+return error;
+}
+
+static int
+netdev_dpdk_delete_queue(struct netdev *netdev, uint32_t queue_id)
+{
+struct netde

Re: [ovs-dev] [PATCH ovn v6] ovn-northd: Limit ARP/ND broadcast domain whenever possible.

2019-11-11 Thread Dumitru Ceara
On Mon, Nov 11, 2019 at 6:11 PM Han Zhou  wrote:
>
> 1. Would there be problem for VLAN backed logical router use case, since a 
> chassis MAC is used as src MAC to send packets from router ports?
> 2. How about checking if tpa == spa to make sure GARP is always flooded? (not 
> directly supported by OF)

This would've been nice to have in OF :)

>
>
> On Mon, Nov 11, 2019 at 5:32 AM Dumitru Ceara  wrote:
> >
> > On Sat, Nov 9, 2019 at 8:35 AM Han Zhou  wrote:
> > >
> > >
> > >
> > > On Fri, Nov 8, 2019 at 6:38 AM Dumitru Ceara  wrote:
> > > >
> > > > ARP request and ND NS packets for router owned IPs were being
> > > > flooded in the complete L2 domain (using the MC_FLOOD multicast group).
> > > > However this creates a scaling issue in scenarios where aggregation
> > > > logical switches are connected to more logical routers (~350). The
> > > > logical pipelines of all routers would have to be executed before the
> > > > packet is finally replied to by a single router, the owner of the IP
> > > > address.
> > > >
> > > > This commit limits the broadcast domain by bypassing the L2 Lookup stage
> > > > for ARP requests that will be replied by a single router. The packets
> > > > are forwarded only to the router port that owns the target IP address.
> > > >
> > > > IPs that are owned by the routers and for which this fix applies are:
> > > > - IP addresses configured on the router ports.
> > > > - VIPs.
> > > > - NAT IPs.
> > > >
> > > > This commit also fixes function get_router_load_balancer_ips() which
> > > > was incorrectly returning a single address_family even though the
> > > > IP set could contain a mix of IPv4 and IPv6 addresses.
> > > >
> > > > Reported-at: https://bugzilla.redhat.com/1756945
> > > > Reported-by: Anil Venkata 
> > > > Signed-off-by: Dumitru Ceara 
> > > >
> > > > ---
> > > > v6:
> > > > - Address Han's comments:
> > > >   - remove flooding of ARPs targeting OVN owned IP addresses.
> > > >   - update ovn-architecture documentation.
> > > >   - rename ARP handling functions.
> > > >   - Adapt "ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR" autotest to take into
> > > > account the new way of forwarding ARPs.
> > > > - Also, properly deal with ARP packets on VLAN-backed networks.
> > >
> > > I am confused by this additional VLAN related change. VLAN is just 
> > > handled as bridged logical switch where a localnet port is used as 
> > > *inport*. It seems to me no difference from regular localnet port no 
> > > matter with or without VLAN tag. When an ARP request is going through the 
> > > ingress pipeline of the bridged logical switch, the logic of bypassing 
> > > the flooding added by this patch should just apply, right? It is also 
> > > very common scenario that the *aggregation switch* for the routers is an 
> > > external physical network backed by VLAN. I think the purpose of this 
> > > patch is to make sure scenario scale. Did I misunderstand anything here?
> >
> > Hi Han,
> >
> > The reason behind it was that with VLAN backed networks when packets
> > move between hypervisors without going through geneve we rerun the
> > ingress pipeline. That would mean that the flows I added for self
> > originated (G)ARP packets won't be hit for ARP requests originated by
> > ovn-controller on a remote hypervisor:
> >
> > priority=80 match: "inport=, arp.op == 1" action:
> > "outport=MC_FLOOD"
> >
> > But instead we would hit:
> > priority=75 match: "arp.op == 1 && arp.tpa ==  > "outport=" and never send flood the packet out on
> > the second HV.
> >
>
> Thanks for the explain. Now I understand the problem.
>
> > You're right, the way I added the check for all VLAN packets is not OK
> > as we fall back to the old behavior too often. However, I'm not sure
> > what the best option is. Do you think it's fine if I change the
> > priority 80 flow to the following?
> >
> > priority=80 match: "eth.src={lrp.ea, lr.nat.external_macs} && arp.op
> > == 1" action: "outport=MC_FLOOD"
> >
> > The idea would be to identify self originated ARP requests by matching
> > the source mac instead of logical ingress port (as this might not be
> > present). I tried it locally and it works fine but we do need to add
> > more flows than before.
> >
>
> Would this work when "ovn-chassis-mac-mappings" is configured for VLAN backed 
> logical routers? We might end up adding chassis unique MACs, too?

I think it will work fine because when ovn-chassis-mac-mappings is
configured we add an entry in table OFTABLE_PHY_TO_LOG to match on
CHASSIS_MAC_TO_ROUTER_MAC_CONJID (comparing eth.src to all
ovn-chassis-macs) to rewrite the eth.src address with that of the
router port.

>
> Alternatively, I think we may change the priority 80 flow to match for each 
> OVN router owned IP: arp.tpa == IP && arp.spa == IP ... Would this ensure OVN 
> router generated ARPs are flooded?
>

I considered this too but because a router port can have an
"unlimited" number of networks configured I decided to go for MAC to
minimize the number of f

Re: [ovs-dev] [PATCH ovn 3/3] ovn-detrace: Add support for other types of SB cookies.

2019-11-11 Thread Mark Michelson

Hi Dumitru. Everything you've done looks good to me.

However, when reviewing the patch, I educated myself on how ovn-detrace 
is implemented and I found a couple of problems. These problems aren't 
introduced by you. However, I think that since you have broadened the 
scope of the southbound database search area, they should probably be 
addressed.


On 11/8/19 6:15 AM, Dumitru Ceara wrote:

Commit eb25a7da639e ("Improve debuggability of OVN to OpenFlow
translations.") added cookies for Port_Binding, Mac_Binding,
Multicast_Group, Chassis records to the OpenfFlow entries they
generate.

Add support for parsing such cookies in ovn-detrace too.
Also, refactor ovn-detrace to allow a more generic way of defining
cookie handlers.

Signed-off-by: Dumitru Ceara 
---
  utilities/ovn-detrace.in |  166 --
  1 file changed, 117 insertions(+), 49 deletions(-)

diff --git a/utilities/ovn-detrace.in b/utilities/ovn-detrace.in
index 34b6b0e..79c6d3b 100755
--- a/utilities/ovn-detrace.in
+++ b/utilities/ovn-detrace.in
@@ -98,48 +98,112 @@ class OVSDB(object):
  def find_row_by_partial_uuid(self, table_name, value):
  return self._find_row(table_name, lambda row: value in str(row.uuid))


(Note: this problem existed prior to the patch series)

Since the cookie corresponds to the beginning of the southbound record's 
UUID, this lambda should probably be altered to:


lambda row: str(row.uuid).startswith(value)

The existing lambda could match the cookie with a later part of the 
UUID, returning an invalid match.


  
-

-def get_lflow_from_cookie(ovnsb_db, cookie):
-return ovnsb_db.find_row_by_partial_uuid('Logical_Flow', cookie)
-
-
-def print_lflow(lflow, prefix):
-ldp_uuid = lflow.logical_datapath.uuid
-ldp_name = str(lflow.logical_datapath.external_ids.get('name'))
-
-print '%sLogical datapath: "%s" (%s) [%s]' % (prefix,
-  ldp_name,
-  ldp_uuid,
-  lflow.pipeline)
-print "%sLogical flow: table=%s (%s), priority=%s, " \
-  "match=(%s), actions=(%s)" % (prefix,
-lflow.table_id,
-lflow.external_ids.get('stage-name'),
-lflow.priority,
-str(lflow.match).strip('"'),
-str(lflow.actions).strip('"'))
-
-
-def print_lflow_nb_hint(lflow, prefix, ovnnb_db):
-external_ids = lflow.external_ids
-if external_ids.get('stage-name') in ['ls_in_acl',
-  'ls_out_acl']:
-acl_hint = external_ids.get('stage-hint')
-if not acl_hint:
-return
-acl = ovnnb_db.find_row_by_partial_uuid('ACL', acl_hint)
-if not acl:
+class CookieHandler(object):
+def __init__(self, db, table):
+self._db = db
+self._table = table
+
+def get_record(self, cookie):
+return self._db.find_row_by_partial_uuid(self._table, cookie)
+
+def print_record(self, record, prefix):
+pass
+
+def print_hint(self, record, prefix, db):
+pass
+
+def datapath_str(datapath):
+return '"%s" (%s)' % (str(datapath.external_ids.get('name')),
+  datapath.uuid)
+
+def chassis_str(chassis):
+if len(chassis) == 0:
+return ''
+ch = chassis[0]
+return 'chassis-name "%s", chassis-str "%s"' % (ch.name, ch.hostname)
+
+class LogicalFlowHandler(CookieHandler):
+def __init__(self, ovnsb_db):
+super(LogicalFlowHandler, self).__init__(ovnsb_db, 'Logical_Flow')
+
+def print_record(self, lflow, prefix):
+print('%sLogical datapath: %s [%s]' %
+(prefix, datapath_str(lflow.logical_datapath), lflow.pipeline))
+print('%sLogical flow: table=%s (%s), priority=%s, '
+  'match=(%s), actions=(%s)' %
+(prefix, lflow.table_id, lflow.external_ids.get('stage-name'),
+ lflow.priority,
+ str(lflow.match).strip('"'),
+ str(lflow.actions).strip('"')))
+
+def print_hint(self, lflow, prefix, ovnnb_db):
+external_ids = lflow.external_ids
+if external_ids.get('stage-name') in ['ls_in_acl',
+  'ls_out_acl']:
+acl_hint = external_ids.get('stage-hint')
+if not acl_hint:
+return
+acl = ovnnb_db.find_row_by_partial_uuid('ACL', acl_hint)
+if not acl:
+return
+output = '%sACL: %s, priority=%s, ' \
+ 'match=(%s), %s' % (prefix,
+ acl.direction,
+ acl.priority,
+ acl.match.strip('"'),
+ 

[ovs-dev] [PATCH] dpdk: Deprecate pdump support.

2019-11-11 Thread Ilya Maximets
The conventional way for packet dumping in OVS is to use ovs-tcpdump
that works via traffic mirroring.  DPDK pdump could probably be used
for some lower level debugging, but it is not commonly used for
various reasons.

There are lots of limitations for using this functionality in practice.
Most of them connected with running secondary pdump process and
memory layout issues like requirement to disable ASLR in kernel.
More details are available in DPDK guide:
https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Beside the functional limitations it's also hard to use this
functionality correctly.  User must be sure that OVS and pdump utility
are running on different CPU cores, which is hard because non-PMD
threads could float over available CPU cores.  This or any other
misconfiguration will likely lead to crash of the pdump utility
or/and OVS.

Another problem is that the user must actually have this special pdump
utility in a system and it might be not available in distributions.

This change disables pdump support by default introducing special
configuration option '--enable-dpdk-pdump'.  Deprecation warnings will
be shown to users on configuration and in runtime.

Claiming to completely remove this functionality from OVS in one
of the next releases.

Signed-off-by: Ilya Maximets 
Acked-by: Aaron Conole 
---

Version 1:
  * No changes since RFC.
  * Added ACK from Aaron.

 .travis/linux-build.sh  |  4 +++-
 Documentation/topics/dpdk/pdump.rst |  8 +++-
 NEWS|  4 
 acinclude.m4| 24 ++--
 lib/dpdk.c  |  2 ++
 5 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
index 69260181b..4e74973a3 100755
--- a/.travis/linux-build.sh
+++ b/.travis/linux-build.sh
@@ -124,7 +124,7 @@ function install_dpdk()
 sed -i '/CONFIG_RTE_EAL_IGB_UIO=y/s/=y/=n/' build/.config
 sed -i '/CONFIG_RTE_KNI_KMOD=y/s/=y/=n/' build/.config
 
-# Enable pdump.  This will enable building of the relevant OVS code.
+# Enable pdump support in DPDK.
 sed -i '/CONFIG_RTE_LIBRTE_PMD_PCAP=n/s/=n/=y/' build/.config
 sed -i '/CONFIG_RTE_LIBRTE_PDUMP=n/s/=n/=y/' build/.config
 
@@ -168,6 +168,8 @@ if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then
 DPDK_VER="18.11.2"
 fi
 install_dpdk $DPDK_VER
+# Enable pdump support in OVS.
+EXTRA_OPTS="${EXTRA_OPTS} --enable-dpdk-pdump"
 if [ "$CC" = "clang" ]; then
 # Disregard cast alignment errors until DPDK is fixed
 CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} -Wno-cast-align"
diff --git a/Documentation/topics/dpdk/pdump.rst 
b/Documentation/topics/dpdk/pdump.rst
index 7bd1d3e9f..b4d8aa8e9 100644
--- a/Documentation/topics/dpdk/pdump.rst
+++ b/Documentation/topics/dpdk/pdump.rst
@@ -27,10 +27,16 @@ pdump
 
 .. versionadded:: 2.6.0
 
+.. warning::
+
+   DPDK pdump support is deprecated in OVS and will be removed in next
+   releases.
+
 pdump allows you to listen on DPDK ports and view the traffic that is passing
 on them. To use this utility, one must have libpcap installed on the system.
 Furthermore, DPDK must be built with ``CONFIG_RTE_LIBRTE_PDUMP=y`` and
-``CONFIG_RTE_LIBRTE_PMD_PCAP=y``.
+``CONFIG_RTE_LIBRTE_PMD_PCAP=y``. OVS should be built with
+``--enable-dpdk-pdump`` configuration option.
 
 .. warning::
 
diff --git a/NEWS b/NEWS
index 88b818948..0d65d5a7f 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,10 @@ Post-v2.12.0
if supported by libbpf.
  * Add option to enable, disable and query TCP sequence checking in
conntrack.
+   - DPDK:
+ * DPDK pdump packet capture support disabled by default. New configure
+   option '--enable-dpdk-pdump' to enable it.
+ * DPDK pdump support is deprecated and will be removed in next releases.
 
 v2.12.0 - 03 Sep 2019
 -
diff --git a/acinclude.m4 b/acinclude.m4
index fc6157ac8..542637ac8 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -357,12 +357,24 @@ AC_DEFUN([OVS_CHECK_DPDK], [
   AC_DEFINE([VHOST_NUMA], [1], [NUMA Aware vHost support detected in 
DPDK.])
 ], [], [[#include ]])
 
-AC_CHECK_DECL([RTE_LIBRTE_PMD_PCAP], [
-  OVS_FIND_DEPENDENCY([pcap_dump], [pcap], [libpcap])
-  AC_CHECK_DECL([RTE_LIBRTE_PDUMP], [
-AC_DEFINE([DPDK_PDUMP], [1], [DPDK pdump enabled in OVS.])
-  ], [], [[#include ]])
-], [], [[#include ]])
+   AC_MSG_CHECKING([whether DPDK pdump support is enabled])
+   AC_ARG_ENABLE(
+ [dpdk-pdump],
+ [AC_HELP_STRING([--enable-dpdk-pdump],
+ [Enable DPDK pdump packet capture support])],
+ [AC_MSG_RESULT([yes])
+  AC_MSG_WARN([DPDK pdump is deprecated, consider using ovs-tcpdump 
instead])
+  AC_CHECK_DECL([RTE_LIBRTE_PMD_PCAP], [
+OVS_FIND_DEPENDENCY([pcap_dump], [pcap], [libpcap])
+AC_CHECK_DECL([RTE_LIBRTE_PDUMP], [
+  AC_DEFINE([DPDK_PDUMP], [1]

Re: [ovs-dev] [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

2019-11-11 Thread Ilya Maximets
On 11.11.2019 17:11, Ilya Maximets wrote:
>>> I'm not sure if clang annotations will work with rte_spinlock.
>>> DPDK doesn't have proper annotations for locking functions.
>>>
>>
>> Ah, good point, I didn't check the lock type. In that case nevermind,
>> patch+incremental LGTM as is.
>>
>> Acked-by: Kevin Traynor 
> 
> Thanks. In this case, I think, there is no need to re-spin the series.
> I'll just squash the incremental with this patch and give it another try.
> If it'll work fine, I'll apply the series.

Thanks Sriram and Kevin! Series applied to master.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-afxdp: Best-effort configuration of XDP mode.

2019-11-11 Thread William Tu
On Thu, Nov 07, 2019 at 01:39:28PM +0100, Eelco Chaudron wrote:
> 
> 
> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> 
> >Until now there was only two options for XDP mode in OVS: SKB or DRV.
> >i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> >
> >Devices like 'veth' interfaces in Linux supports native XDP, but
> >doesn't support zero-copy mode.  This case can not be covered by
> >existing API and we have to use slower generic XDP for such devices.
> >There are few more issues, e.g. TCP is not supported in generic XDP
> >mode for veth interfaces due to kernel limitations, however it is
> >supported in native mode.
> >
> >This change introduces ability to use native XDP without zero-copy
> >along with best-effort configuration option that enabled by default.
> >In best-effort case OVS will sequentially try different modes starting
> >from the fastest one and will choose the first acceptable for current
> >interface.  This will guarantee the best possible performance.
> >
> >If user will want to choose specific mode, it's still possible by
> >setting the 'options:xdp-mode'.
> >
> >This change additionally changes the API by renaming the configuration
> >knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> >themselves to be more user-friendly.
> >
> >The full list of currently supported modes:
> >  * native-with-zerocopy - former DRV
> >  * native   - new one, DRV without zero-copy
> >  * generic  - former SKB
> >  * best-effort  - new one, chooses the best available from
> >   3 above modes
> >
> >Since 'best-effort' is a default mode, users will not need to
> >explicitely set 'xdp-mode' in most cases.
> >
> >TCP related tests enabled back in system afxdp testsuite, because
> >'best-effort' will choose 'native' mode for veth interfaces
> >and this mode has no issues with TCP.
> >
> >Signed-off-by: Ilya Maximets 
> 
> No review, I was running some performance tests for the OVS conference
> presentation so quickly tried this patch (assuming performance would
> increase :)…
> 
> So with the native option (default for tap) I see a decrease, :(,  in
> performance over the skb mode (this is with my usual ovs_perf PVP test
> setup).
> 
> With the patch applied, and default configuration
> (xdp-mode-in-use=native-with-zerocopy for my tap):
> 
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> , Packet size
> Number of flows,  64
>1, 711581
>  100, 673152
> 1000, 617733
> 
> Here you will even see a performance drop with multiple IP flows (this is a
> single queue config).
> 
> With SKB mode (xdp-mode-in-use=generic):
> 
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> , Packet size
> Number of flows,  64
>1, 800796
>  100, 800912
> 1000, 800133
> 

Hi Eelco,

I have another question (not directly related to this patch).

When doing PVP testing, what's the total CPU consumption?
I'm trying to think about the efficiency, ex:
  Packet forwarding rate / CPU utilization

For native-with-zerocopy, is it correct that your system uses
  ksoftirqd/x 100% (rx port)
  ksoftirqd/y 100% (tx port)
  qemu-kvm100%
  vhost_net   100%
  ovs-vswitchd 100% (pmd thead)

Or with need_wakeup, it's lower than this number?

Thanks
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpdk: Migrate to the new pdump API.

2019-11-11 Thread Ilya Maximets
Sorry, I missed the 'dpdk-latest' tag.

On 11.11.2019 16:01, Ilya Maximets wrote:
> DPDK commit 660098d61f57 ("pdump: use generic multi-process channel")
> switched pdump to use generic DPDK IPC instead of sockets.
> Old API was deprecated and removed.  Updating OVS code accordingly.
> 
> Signed-off-by: Ilya Maximets 
> ---
> 
> Version 3:
>   * Added note about XDG_RUNTIME_DIR.
> 
> Version 2:
>   * Removed unneeded deinitialization on error.
>   * Docs updated.
> 
>  Documentation/topics/dpdk/pdump.rst | 13 +++--
>  lib/dpdk.c  | 11 +--
>  2 files changed, 8 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/pdump.rst 
> b/Documentation/topics/dpdk/pdump.rst
> index 7bd1d3e9f..affd98371 100644
> --- a/Documentation/topics/dpdk/pdump.rst
> +++ b/Documentation/topics/dpdk/pdump.rst
> @@ -41,8 +41,7 @@ To use pdump, simply launch OVS as usual, then navigate to 
> the ``app/pdump``
>  directory in DPDK, ``make`` the application and run like so::
>  
>  $ sudo ./build/app/dpdk-pdump -- \
> ---pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \
> ---server-socket-path=/usr/local/var/run/openvswitch
> +--pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap
>  
>  The above command captures traffic received on queue 0 of port 0 and stores 
> it
>  in ``/tmp/pkts.pcap``. Other combinations of port numbers, queues numbers and
> @@ -50,11 +49,13 @@ pcap locations are of course also available to use. For 
> example, to capture all
>  packets that traverse port 0 in a single pcap file::
>  
>  $ sudo ./build/app/dpdk-pdump -- \
> ---pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap' 
> \
> ---server-socket-path=/usr/local/var/run/openvswitch
> +--pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap'
>  
> -``server-socket-path`` must be set to the value of ``ovs_rundir()`` which
> -typically resolves to ``/usr/local/var/run/openvswitch``.
> +.. note::
> +
> +   ``XDG_RUNTIME_DIR`` environment variable might need to be adjusted to
> +   OVS runtime directory (``/var/run/openvswitch`` in most cases) for
> +   ``dpdk-pdump`` utility if OVS started by non-root user.
>  
>  Many tools are available to view the contents of the pcap file. Once example 
> is
>  tcpdump. Issue the following command to view the contents of ``pkts.pcap``::
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index f90cda75a..748f63d31 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -27,7 +27,6 @@
>  #include 
>  #include 
>  #ifdef DPDK_PDUMP
> -#include 
>  #include 
>  #endif
>  
> @@ -434,17 +433,9 @@ dpdk_init__(const struct smap *ovs_other_config)
>  
>  #ifdef DPDK_PDUMP
>  VLOG_INFO("DPDK pdump packet capture enabled");
> -err = rte_pdump_init(ovs_rundir());
> +err = rte_pdump_init();
>  if (err) {
>  VLOG_INFO("Error initialising DPDK pdump");
> -rte_pdump_uninit();
> -} else {
> -char *server_socket_path;
> -
> -server_socket_path = xasprintf("%s/%s", ovs_rundir(),
> -   "pdump_server_socket");
> -fatal_signal_add_file_to_unlink(server_socket_path);
> -free(server_socket_path);
>  }
>  #endif
>  
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v6] ovn-northd: Limit ARP/ND broadcast domain whenever possible.

2019-11-11 Thread Han Zhou
1. Would there be problem for VLAN backed logical router use case, since a
chassis MAC is used as src MAC to send packets from router ports?
2. How about checking if tpa == spa to make sure GARP is always flooded?
(not directly supported by OF)


On Mon, Nov 11, 2019 at 5:32 AM Dumitru Ceara  wrote:
>
> On Sat, Nov 9, 2019 at 8:35 AM Han Zhou  wrote:
> >
> >
> >
> > On Fri, Nov 8, 2019 at 6:38 AM Dumitru Ceara  wrote:
> > >
> > > ARP request and ND NS packets for router owned IPs were being
> > > flooded in the complete L2 domain (using the MC_FLOOD multicast
group).
> > > However this creates a scaling issue in scenarios where aggregation
> > > logical switches are connected to more logical routers (~350). The
> > > logical pipelines of all routers would have to be executed before the
> > > packet is finally replied to by a single router, the owner of the IP
> > > address.
> > >
> > > This commit limits the broadcast domain by bypassing the L2 Lookup
stage
> > > for ARP requests that will be replied by a single router. The packets
> > > are forwarded only to the router port that owns the target IP address.
> > >
> > > IPs that are owned by the routers and for which this fix applies are:
> > > - IP addresses configured on the router ports.
> > > - VIPs.
> > > - NAT IPs.
> > >
> > > This commit also fixes function get_router_load_balancer_ips() which
> > > was incorrectly returning a single address_family even though the
> > > IP set could contain a mix of IPv4 and IPv6 addresses.
> > >
> > > Reported-at: https://bugzilla.redhat.com/1756945
> > > Reported-by: Anil Venkata 
> > > Signed-off-by: Dumitru Ceara 
> > >
> > > ---
> > > v6:
> > > - Address Han's comments:
> > >   - remove flooding of ARPs targeting OVN owned IP addresses.
> > >   - update ovn-architecture documentation.
> > >   - rename ARP handling functions.
> > >   - Adapt "ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR" autotest to take
into
> > > account the new way of forwarding ARPs.
> > > - Also, properly deal with ARP packets on VLAN-backed networks.
> >
> > I am confused by this additional VLAN related change. VLAN is just
handled as bridged logical switch where a localnet port is used as
*inport*. It seems to me no difference from regular localnet port no matter
with or without VLAN tag. When an ARP request is going through the ingress
pipeline of the bridged logical switch, the logic of bypassing the flooding
added by this patch should just apply, right? It is also very common
scenario that the *aggregation switch* for the routers is an external
physical network backed by VLAN. I think the purpose of this patch is to
make sure scenario scale. Did I misunderstand anything here?
>
> Hi Han,
>
> The reason behind it was that with VLAN backed networks when packets
> move between hypervisors without going through geneve we rerun the
> ingress pipeline. That would mean that the flows I added for self
> originated (G)ARP packets won't be hit for ARP requests originated by
> ovn-controller on a remote hypervisor:
>
> priority=80 match: "inport=, arp.op == 1" action:
> "outport=MC_FLOOD"
>
> But instead we would hit:
> priority=75 match: "arp.op == 1 && arp.tpa ==  "outport=" and never send flood the packet out on
> the second HV.
>

Thanks for the explain. Now I understand the problem.

> You're right, the way I added the check for all VLAN packets is not OK
> as we fall back to the old behavior too often. However, I'm not sure
> what the best option is. Do you think it's fine if I change the
> priority 80 flow to the following?
>
> priority=80 match: "eth.src={lrp.ea, lr.nat.external_macs} && arp.op
> == 1" action: "outport=MC_FLOOD"
>
> The idea would be to identify self originated ARP requests by matching
> the source mac instead of logical ingress port (as this might not be
> present). I tried it locally and it works fine but we do need to add
> more flows than before.
>

Would this work when "ovn-chassis-mac-mappings" is configured for VLAN
backed logical routers? We might end up adding chassis unique MACs, too?

Alternatively, I think we may change the priority 80 flow to match for each
OVN router owned IP: arp.tpa == IP && arp.spa == IP ... Would this ensure
OVN router generated ARPs are flooded?

>
> >
> > In addition, the below macro definition may be renamed to FLAGBIT_...,
because it is for the bits of MFF_LOG_FLAGS, which is one of the
pre-defined logical fields, instead of for the MFF_LOG_REG0-9 registers.
> > >
> > > +#define REGBIT_NOT_VXLAN "flags[1] == 0"
> > > +#define REGBIT_NOT_VLAN "flags[7] == 0"
> > > +
> >
> > The other part looks good to me. Thanks for simply the patch.
> >
> > Han
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] compat: Add compat fix for old kernels

2019-11-11 Thread Roi Dayan



On 2019-11-11 4:08 PM, Roi Dayan wrote:
> 
> 
> On 2019-11-11 3:39 PM, Simon Horman wrote:
>> On Mon, Nov 04, 2019 at 11:49:44AM +0200, Roi Dayan wrote:
>>> In kernels older than 4.8, struct tcf_t didn't have the firstuse.
>>> If openvswitch is compiled with the compat pkt_cls.h then there is
>>> a struct size mismatch between openvswitch and the kernel which cause
>>> parsing netlink actions to fail.
>>> After this commit parsing the netlink actions pass even if compiled with
>>> the compat pkt_cls.h.
>>>
>>> Signed-off-by: Roi Dayan 
>>> ---
>>>
>>> v1->v2:
>>> - fix mix of tabs and spaces in acinclude.m4
>>
>> Thanks, applied to master and branch-2.8 ... branch-2.12
> 
> thanks!
> 

Hi Simon,

sorry for the trouble but there is also v3 for this patch.
it uses ac_check_member instead of ac_compile.
is it possible to take v3 which is a lot cleaner?

Thanks,
Roi


>>
>>>  acinclude.m4| 8 
>>>  include/linux/pkt_cls.h | 2 ++
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/acinclude.m4 b/acinclude.m4
>>> index a0507cfe019e..1e699191bb8b 100644
>>> --- a/acinclude.m4
>>> +++ b/acinclude.m4
>>> @@ -186,6 +186,14 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [
>>> [Define to 1 if TCA_FLOWER_KEY_ENC_IP_TTL_MASK is 
>>> available.])])
>>>  
>>>AC_COMPILE_IFELSE([
>>> +AC_LANG_PROGRAM([#include ], [
>>> +struct tcf_t x;
>>> +x.firstuse = 1;
>>> +])],
>>> +[AC_DEFINE([HAVE_TCF_T_FIRSTUSE], [1],
>>> +   [Define to 1 if struct tcf_t have firstuse.])])
>>> +
>>> +  AC_COMPILE_IFELSE([
>>>  AC_LANG_PROGRAM([#include ], [
>>>  int x = TCA_VLAN_PUSH_VLAN_PRIORITY;
>>>  ])],
>>> diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
>>> index 4adea59e7c36..b6926a79a0af 100644
>>> --- a/include/linux/pkt_cls.h
>>> +++ b/include/linux/pkt_cls.h
>>> @@ -64,7 +64,9 @@ struct tcf_t {
>>> __u64   install;
>>> __u64   lastuse;
>>> __u64   expires;
>>> +#ifdef HAVE_TCF_T_FIRSTUSE
>>> __u64   firstuse;
>>> +#endif
>>>  };
>>>  
>>>  #define tc_gen \
>>> -- 
>>> 2.8.4
>>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case multi-segments.

2019-11-11 Thread Darrell Ball
Hi Zhike

Thanks for clarifying
There is presently no support for multi-segment mbufs in OVS, so the patch
would
not be needed and in general patches are only proposed at the time they are
relevant.

Also note that in the hypothetical case of multisegment mbufs, the packet
would be linearized at
this point.

Also note that this patch still relies on pointers..

As Ilya already mentioned there is no need to use multi-segment mbufs on
OVS as large buffers
can be used instead.

On Sun, Nov 10, 2019 at 9:50 PM 王志克  wrote:

> Hi Darrell,
>
> In TSO case, the packet may use multi-segments mbuf, and I do not think we
> need to make it linearal. In this case, we can NOT use pointer to calculate
> the tcp length.
>
> Br,
>
> Zhike Wang
> JDCloud, Product Development, IaaS
>
> 
> Mobile/+86 13466719566
> E- mail/wangzh...@jd.com
> Address/5F Building A,North-Star Century Center,8 Beichen West
> Street,Chaoyang District Beijing
> Https://JDCloud.com
>
> 
>
>
> From: Darrell Ball [mailto:dlu...@gmail.com]
> Sent: Saturday, November 09, 2019 8:12 AM
> To: Zhike Wang
> Cc: ovs dev; 王志克
> Subject: Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case
> multi-segments.
>
> Thanks for the patch
>
> Would you mind describing the use case that this patch is aiming to
> support ?
>
> On Fri, Nov 8, 2019 at 1:23 AM Zhike Wang  wrote:
> Signed-off-by: Zhike Wang 
> ---
>  lib/conntrack-private.h | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 590f139..1d21f6e 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -233,13 +233,17 @@ conn_update_expiration(struct conntrack *ct, struct
> conn *conn,
>  static inline uint32_t
>  tcp_payload_length(struct dp_packet *pkt)
>  {
> -const char *tcp_payload = dp_packet_get_tcp_payload(pkt);
> -if (tcp_payload) {
> -return ((char *) dp_packet_tail(pkt) - dp_packet_l2_pad_size(pkt)
> -- tcp_payload);
> -} else {
> -return 0;
> +size_t l4_size = dp_packet_l4_size(pkt);
> +
> +if (OVS_LIKELY(l4_size >= TCP_HEADER_LEN)) {
> +struct tcp_header *tcp = dp_packet_l4(pkt);
> +int tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
> +
> +if (OVS_LIKELY(tcp_len >= TCP_HEADER_LEN && tcp_len <= l4_size)) {
> +return (l4_size - tcp_len);
> +}
>
> Maybe I missed something, but it looks like the same calculation is
> arrived at.
>
>  }
> +return 0;
>  }
>
>  #endif /* conntrack-private.h */
> --
> 1.8.3.1
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

2019-11-11 Thread Ilya Maximets
On 11.11.2019 17:06, Kevin Traynor wrote:
> On 11/11/2019 15:59, Ilya Maximets wrote:
>> On 11.11.2019 16:55, Kevin Traynor wrote:
>>> On 10/11/2019 23:20, Ilya Maximets wrote:
 On 29.10.2019 15:50, Sriram Vatala wrote:> @@ -2388,12 +2412,16 @@ 
 __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>   }
>   } while (cnt && (retries++ < max_retries));
>   
> +tx_failure = cnt;
>   rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
>   
>   rte_spinlock_lock(&dev->stats_lock);
>   netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
>cnt + dropped);
> -dev->tx_retries += MIN(retries, max_retries);
> +sw_stats->tx_retries += MIN(retries, max_retries);
> +sw_stats->tx_failure_drops += tx_failure;
> +sw_stats->tx_mtu_exceeded_drops += mtu_drops;
> +sw_stats->tx_qos_drops += qos_drops;

 Kevin pointed to this part of code in hope that we can move this to
 a separate function and reuse in his review for v9.  This code catches
 my eyes too.  I don't think that we can reuse this part, at least this
 will be not very efficient in current situation (clearing of the unused
 fields in a stats structure will be probably needed before calling such
 a common function, but ETH tx uses only half of the struct).

 But there is another thing here.  We already have special functions
 for vhost tx/rx counters.  And it looks strange when we're not using
 these functions to update tx/rx failure counters.

 Suggesting following incremental.
 Kevin, Sriram, please, share your thoughts.

>>>
>>> The incremental patch looks good, thanks. One additional thing is that
>>> OVS_REQUIRES(dev->stats_lock) annotations can be used for the vhost
>>> rx/tx update counter functions now (even if it seems unlikely someone
>>> would miss doing that).
>>>
>>> @Sriram, see below or you can check similar OVS_REQUIRES usage elsewhere
>>> in the file.
>>
>> I'm not sure if clang annotations will work with rte_spinlock.
>> DPDK doesn't have proper annotations for locking functions.
>>
> 
> Ah, good point, I didn't check the lock type. In that case nevermind,
> patch+incremental LGTM as is.
> 
> Acked-by: Kevin Traynor 

Thanks. In this case, I think, there is no need to re-spin the series.
I'll just squash the incremental with this patch and give it another try.
If it'll work fine, I'll apply the series.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

2019-11-11 Thread Kevin Traynor
On 11/11/2019 15:59, Ilya Maximets wrote:
> On 11.11.2019 16:55, Kevin Traynor wrote:
>> On 10/11/2019 23:20, Ilya Maximets wrote:
>>> On 29.10.2019 15:50, Sriram Vatala wrote:> @@ -2388,12 +2412,16 @@ 
>>> __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
   }
   } while (cnt && (retries++ < max_retries));
   
 +tx_failure = cnt;
   rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
   
   rte_spinlock_lock(&dev->stats_lock);
   netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
cnt + dropped);
 -dev->tx_retries += MIN(retries, max_retries);
 +sw_stats->tx_retries += MIN(retries, max_retries);
 +sw_stats->tx_failure_drops += tx_failure;
 +sw_stats->tx_mtu_exceeded_drops += mtu_drops;
 +sw_stats->tx_qos_drops += qos_drops;
>>>
>>> Kevin pointed to this part of code in hope that we can move this to
>>> a separate function and reuse in his review for v9.  This code catches
>>> my eyes too.  I don't think that we can reuse this part, at least this
>>> will be not very efficient in current situation (clearing of the unused
>>> fields in a stats structure will be probably needed before calling such
>>> a common function, but ETH tx uses only half of the struct).
>>>
>>> But there is another thing here.  We already have special functions
>>> for vhost tx/rx counters.  And it looks strange when we're not using
>>> these functions to update tx/rx failure counters.
>>>
>>> Suggesting following incremental.
>>> Kevin, Sriram, please, share your thoughts.
>>>
>>
>> The incremental patch looks good, thanks. One additional thing is that
>> OVS_REQUIRES(dev->stats_lock) annotations can be used for the vhost
>> rx/tx update counter functions now (even if it seems unlikely someone
>> would miss doing that).
>>
>> @Sriram, see below or you can check similar OVS_REQUIRES usage elsewhere
>> in the file.
> 
> I'm not sure if clang annotations will work with rte_spinlock.
> DPDK doesn't have proper annotations for locking functions.
> 

Ah, good point, I didn't check the lock type. In that case nevermind,
patch+incremental LGTM as is.

Acked-by: Kevin Traynor 



thanks,
Kevin.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] tc: Set 'no_percpu' flag for compatible actions

2019-11-11 Thread Simon Horman
On Mon, Nov 04, 2019 at 06:34:49PM +0200, Roi Dayan wrote:
> From: Vlad Buslov 
> 
> Recent changes in Linux kernel TC action subsystem introduced new
> TCA_ACT_FLAGS_NO_PERCPU_STATS flag. The purpose of the flag is to request
> action implementation to skip allocating action stats with expensive percpu
> allocator and use regular built-in action stats instead. Such approach
> significantly improves rule insertion rate and reduce memory usage for
> hardware-offloaded rules that don't need benefits provided by percpu
> allocated stats (improved software TC fast-path performance). Set the flag
> for all compatible actions.
> 
> Modify acinclude.m4 to use OVS-internal pkt_cls.h implementation when
> TCA_ACT_FLAGS is not defined by kernel headers and to manually define
> struct nla_bitfield32 in netlink.h (new file) when it is not defined by
> kernel headers.
> 
> Signed-off-by: Vlad Buslov 
> Reviewed-by: Roi Dayan 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

2019-11-11 Thread Ilya Maximets
On 11.11.2019 16:55, Kevin Traynor wrote:
> On 10/11/2019 23:20, Ilya Maximets wrote:
>> On 29.10.2019 15:50, Sriram Vatala wrote:> @@ -2388,12 +2412,16 @@ 
>> __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>>>   }
>>>   } while (cnt && (retries++ < max_retries));
>>>   
>>> +tx_failure = cnt;
>>>   rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
>>>   
>>>   rte_spinlock_lock(&dev->stats_lock);
>>>   netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
>>>cnt + dropped);
>>> -dev->tx_retries += MIN(retries, max_retries);
>>> +sw_stats->tx_retries += MIN(retries, max_retries);
>>> +sw_stats->tx_failure_drops += tx_failure;
>>> +sw_stats->tx_mtu_exceeded_drops += mtu_drops;
>>> +sw_stats->tx_qos_drops += qos_drops;
>>
>> Kevin pointed to this part of code in hope that we can move this to
>> a separate function and reuse in his review for v9.  This code catches
>> my eyes too.  I don't think that we can reuse this part, at least this
>> will be not very efficient in current situation (clearing of the unused
>> fields in a stats structure will be probably needed before calling such
>> a common function, but ETH tx uses only half of the struct).
>>
>> But there is another thing here.  We already have special functions
>> for vhost tx/rx counters.  And it looks strange when we're not using
>> these functions to update tx/rx failure counters.
>>
>> Suggesting following incremental.
>> Kevin, Sriram, please, share your thoughts.
>>
> 
> The incremental patch looks good, thanks. One additional thing is that
> OVS_REQUIRES(dev->stats_lock) annotations can be used for the vhost
> rx/tx update counter functions now (even if it seems unlikely someone
> would miss doing that).
> 
> @Sriram, see below or you can check similar OVS_REQUIRES usage elsewhere
> in the file.

I'm not sure if clang annotations will work with rte_spinlock.
DPDK doesn't have proper annotations for locking functions.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

2019-11-11 Thread Kevin Traynor
On 10/11/2019 23:20, Ilya Maximets wrote:
> On 29.10.2019 15:50, Sriram Vatala wrote:> @@ -2388,12 +2412,16 @@ 
> __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>>   }
>>   } while (cnt && (retries++ < max_retries));
>>   
>> +tx_failure = cnt;
>>   rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
>>   
>>   rte_spinlock_lock(&dev->stats_lock);
>>   netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
>>cnt + dropped);
>> -dev->tx_retries += MIN(retries, max_retries);
>> +sw_stats->tx_retries += MIN(retries, max_retries);
>> +sw_stats->tx_failure_drops += tx_failure;
>> +sw_stats->tx_mtu_exceeded_drops += mtu_drops;
>> +sw_stats->tx_qos_drops += qos_drops;
> 
> Kevin pointed to this part of code in hope that we can move this to
> a separate function and reuse in his review for v9.  This code catches
> my eyes too.  I don't think that we can reuse this part, at least this
> will be not very efficient in current situation (clearing of the unused
> fields in a stats structure will be probably needed before calling such
> a common function, but ETH tx uses only half of the struct).
> 
> But there is another thing here.  We already have special functions
> for vhost tx/rx counters.  And it looks strange when we're not using
> these functions to update tx/rx failure counters.
> 
> Suggesting following incremental.
> Kevin, Sriram, please, share your thoughts.
> 

The incremental patch looks good, thanks. One additional thing is that
OVS_REQUIRES(dev->stats_lock) annotations can be used for the vhost
rx/tx update counter functions now (even if it seems unlikely someone
would miss doing that).

@Sriram, see below or you can check similar OVS_REQUIRES usage elsewhere
in the file.

> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 3cb7023a8..02120a379 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2169,16 +2169,18 @@ netdev_dpdk_vhost_update_rx_size_counters(struct 
> netdev_stats *stats,
>   }
>   
>   static inline void
> -netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats,
> +netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev,
>struct dp_packet **packets, int count,
> - int dropped)
> + int qos_drops)
OVS_REQUIRES(dev->stats_lock)
>   {
> -int i;
> -unsigned int packet_size;
> +struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
> +struct netdev_stats *stats = &dev->stats;
>   struct dp_packet *packet;
> +unsigned int packet_size;
> +int i;
>   
>   stats->rx_packets += count;
> -stats->rx_dropped += dropped;
> +stats->rx_dropped += qos_drops;
>   for (i = 0; i < count; i++) {
>   packet = packets[i];
>   packet_size = dp_packet_size(packet);
> @@ -2201,6 +2203,8 @@ netdev_dpdk_vhost_update_rx_counters(struct 
> netdev_stats *stats,
>   
>   stats->rx_bytes += packet_size;
>   }
> +
> +sw_stats->rx_qos_drops += qos_drops;
>   }
>   
>   /*
> @@ -2213,7 +2217,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
>   struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
>   struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev);
>   uint16_t nb_rx = 0;
> -uint16_t dropped = 0;
> +uint16_t qos_drops = 0;
>   int qid = rxq->queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
>   int vid = netdev_dpdk_get_vid(dev);
>   
> @@ -2240,17 +2244,16 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
>   }
>   
>   if (policer) {
> -dropped = nb_rx;
> +qos_drops = nb_rx;
>   nb_rx = ingress_policer_run(policer,
>   (struct rte_mbuf **) batch->packets,
>   nb_rx, true);
> -dropped -= nb_rx;
> +qos_drops -= nb_rx;
>   }
>   
>   rte_spinlock_lock(&dev->stats_lock);
> -netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
> - nb_rx, dropped);
> -dev->sw_stats->rx_qos_drops += dropped;
> +netdev_dpdk_vhost_update_rx_counters(dev, batch->packets,
> + nb_rx, qos_drops);
>   rte_spinlock_unlock(&dev->stats_lock);
>   
>   batch->count = nb_rx;
> @@ -2360,13 +2363,18 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk 
> *dev, struct rte_mbuf **pkts,
>   }
>   
>   static inline void
> -netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
> +netdev_dpdk_vhost_update_tx_counters(struct netdev_dpdk *dev,
>struct dp_packet **packets,
>int attempted,
> - int dropped)
> + struct netdev_dpdk_sw_stats 
> *sw_stats_add)
OVS_REQUIRES(dev->stats_lock)
>

Re: [ovs-dev] [dpdk-latest PATCH v2] netdev-dpdk: add coverage counter to count vhost IRQs

2019-11-11 Thread Eelco Chaudron



On 11 Nov 2019, at 16:15, Ilya Maximets wrote:


This patch doesn't apply to master, but it applies to dpdk-latest.
dpdk-latest needs some rebase and this patch should be rebased
after that.


Yes I know, any idea when this will be? Or how we will handle these DPDK 
related features?


Will re-submit once dpdk-latest is re-based or master has 19.11 DPDK 
version…



Comments inline.

Best regards, Ilya Maximets.

On 25.10.2019 15:56, Eelco Chaudron wrote:

When the dpdk vhost library executes an eventfd_write() call,
i.e. waking up the guest, a new callback will be called.

This patch adds the callback to count the number of
interrupts sent to the VM to track the number of times
interrupts where generated.

This might be of interest to find out system-calls were
called in the DPDK fast path.

The coverage counter is called "dpdk_vhost_irqs" and can be
read with:

  $ ovs-appctl coverage/show | grep dpdk_vhost_irqs
  dpdk_vhost_irqs  275880.6/sec 129962.683/sec 
6561.7250/sec   total: 23684319


I'm not sure if above line is any valuable here, but it
definitely too long for a commit message.


Will remove it…





  $ ovs-appctl coverage/read-counter dpdk_vhost_irqs
  23684319

Signed-off-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ba92e89..8c6aaeb 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -46,6 +46,7 @@
 #include "dpdk.h"
 #include "dpif-netdev.h"
 #include "fatal-signal.h"
+#include "coverage.h"


We're trying to keep alphabetical order of includes if possible.


I know, wondering why I ignored it ;) Will fix…


 #include "netdev-provider.h"
 #include "netdev-vport.h"
 #include "odp-util.h"
@@ -72,6 +73,8 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);

+COVERAGE_DEFINE(dpdk_vhost_irqs);


Now we have 'vhost_tx_contention' and it's better to have
similar name here. Maybe 'vhost_notification'?


Sound good to me, will change it to vhost_notification


BTW, I'm still not a fan of this concept in general, but It seems
to be a part of a public API in DPDK and we'll stick with it for
some time at least.


+
 #define DPDK_PORT_WATCHDOG_INTERVAL 5

 #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
@@ -165,6 +168,8 @@ static int new_device(int vid);
 static void destroy_device(int vid);
 static int vring_state_changed(int vid, uint16_t queue_id, int 
enable);

 static void destroy_connection(int vid);
+static void vhost_guest_notified(int vid);
+
 static const struct vhost_device_ops virtio_net_device_ops =
 {
 .new_device =  new_device,
@@ -173,6 +178,7 @@ static const struct vhost_device_ops 
virtio_net_device_ops =

 .features_changed = NULL,
 .new_connection = NULL,
 .destroy_connection = destroy_connection,
+.guest_notified = vhost_guest_notified,
 };

 enum { DPDK_RING_SIZE = 256 };
@@ -3746,6 +3752,12 @@ destroy_connection(int vid)
 }
 }

+static
+void vhost_guest_notified(int vid OVS_UNUSED)
+{
+COVERAGE_INC(dpdk_vhost_irqs);
+}
+
 /*
  * Retrieve the DPDK virtio device ID (vid) associated with a 
vhostuser

  * or vhostuserclient netdev.



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [dpdk-latest PATCH v2] netdev-dpdk: add coverage counter to count vhost IRQs

2019-11-11 Thread Ilya Maximets
This patch doesn't apply to master, but it applies to dpdk-latest.
dpdk-latest needs some rebase and this patch should be rebased
after that.

Comments inline.

Best regards, Ilya Maximets.

On 25.10.2019 15:56, Eelco Chaudron wrote:
> When the dpdk vhost library executes an eventfd_write() call,
> i.e. waking up the guest, a new callback will be called.
> 
> This patch adds the callback to count the number of
> interrupts sent to the VM to track the number of times
> interrupts where generated.
> 
> This might be of interest to find out system-calls were
> called in the DPDK fast path.
> 
> The coverage counter is called "dpdk_vhost_irqs" and can be
> read with:
> 
>   $ ovs-appctl coverage/show | grep dpdk_vhost_irqs
>   dpdk_vhost_irqs  275880.6/sec 129962.683/sec 6561.7250/sec   
> total: 23684319

I'm not sure if above line is any valuable here, but it
definitely too long for a commit message.

> 
>   $ ovs-appctl coverage/read-counter dpdk_vhost_irqs
>   23684319
> 
> Signed-off-by: Eelco Chaudron 
> ---
>  lib/netdev-dpdk.c |   12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index ba92e89..8c6aaeb 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -46,6 +46,7 @@
>  #include "dpdk.h"
>  #include "dpif-netdev.h"
>  #include "fatal-signal.h"
> +#include "coverage.h"

We're trying to keep alphabetical order of includes if possible.

>  #include "netdev-provider.h"
>  #include "netdev-vport.h"
>  #include "odp-util.h"
> @@ -72,6 +73,8 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
>  VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>  
> +COVERAGE_DEFINE(dpdk_vhost_irqs);

Now we have 'vhost_tx_contention' and it's better to have
similar name here. Maybe 'vhost_notification'?

BTW, I'm still not a fan of this concept in general, but It seems
to be a part of a public API in DPDK and we'll stick with it for
some time at least.

> +
>  #define DPDK_PORT_WATCHDOG_INTERVAL 5
>  
>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
> @@ -165,6 +168,8 @@ static int new_device(int vid);
>  static void destroy_device(int vid);
>  static int vring_state_changed(int vid, uint16_t queue_id, int enable);
>  static void destroy_connection(int vid);
> +static void vhost_guest_notified(int vid);
> +
>  static const struct vhost_device_ops virtio_net_device_ops =
>  {
>  .new_device =  new_device,
> @@ -173,6 +178,7 @@ static const struct vhost_device_ops 
> virtio_net_device_ops =
>  .features_changed = NULL,
>  .new_connection = NULL,
>  .destroy_connection = destroy_connection,
> +.guest_notified = vhost_guest_notified,
>  };
>  
>  enum { DPDK_RING_SIZE = 256 };
> @@ -3746,6 +3752,12 @@ destroy_connection(int vid)
>  }
>  }
>  
> +static
> +void vhost_guest_notified(int vid OVS_UNUSED)
> +{
> +COVERAGE_INC(dpdk_vhost_irqs);
> +}
> +
>  /*
>   * Retrieve the DPDK virtio device ID (vid) associated with a vhostuser
>   * or vhostuserclient netdev.
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] dpdk: Migrate to the new pdump API.

2019-11-11 Thread Ilya Maximets
DPDK commit 660098d61f57 ("pdump: use generic multi-process channel")
switched pdump to use generic DPDK IPC instead of sockets.
Old API was deprecated and removed.  Updating OVS code accordingly.

Signed-off-by: Ilya Maximets 
---

Version 3:
  * Added note about XDG_RUNTIME_DIR.

Version 2:
  * Removed unneeded deinitialization on error.
  * Docs updated.

 Documentation/topics/dpdk/pdump.rst | 13 +++--
 lib/dpdk.c  | 11 +--
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/Documentation/topics/dpdk/pdump.rst 
b/Documentation/topics/dpdk/pdump.rst
index 7bd1d3e9f..affd98371 100644
--- a/Documentation/topics/dpdk/pdump.rst
+++ b/Documentation/topics/dpdk/pdump.rst
@@ -41,8 +41,7 @@ To use pdump, simply launch OVS as usual, then navigate to 
the ``app/pdump``
 directory in DPDK, ``make`` the application and run like so::
 
 $ sudo ./build/app/dpdk-pdump -- \
---pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \
---server-socket-path=/usr/local/var/run/openvswitch
+--pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap
 
 The above command captures traffic received on queue 0 of port 0 and stores it
 in ``/tmp/pkts.pcap``. Other combinations of port numbers, queues numbers and
@@ -50,11 +49,13 @@ pcap locations are of course also available to use. For 
example, to capture all
 packets that traverse port 0 in a single pcap file::
 
 $ sudo ./build/app/dpdk-pdump -- \
---pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap' \
---server-socket-path=/usr/local/var/run/openvswitch
+--pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-dev=/tmp/pkts.pcap'
 
-``server-socket-path`` must be set to the value of ``ovs_rundir()`` which
-typically resolves to ``/usr/local/var/run/openvswitch``.
+.. note::
+
+   ``XDG_RUNTIME_DIR`` environment variable might need to be adjusted to
+   OVS runtime directory (``/var/run/openvswitch`` in most cases) for
+   ``dpdk-pdump`` utility if OVS started by non-root user.
 
 Many tools are available to view the contents of the pcap file. Once example is
 tcpdump. Issue the following command to view the contents of ``pkts.pcap``::
diff --git a/lib/dpdk.c b/lib/dpdk.c
index f90cda75a..748f63d31 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #ifdef DPDK_PDUMP
-#include 
 #include 
 #endif
 
@@ -434,17 +433,9 @@ dpdk_init__(const struct smap *ovs_other_config)
 
 #ifdef DPDK_PDUMP
 VLOG_INFO("DPDK pdump packet capture enabled");
-err = rte_pdump_init(ovs_rundir());
+err = rte_pdump_init();
 if (err) {
 VLOG_INFO("Error initialising DPDK pdump");
-rte_pdump_uninit();
-} else {
-char *server_socket_path;
-
-server_socket_path = xasprintf("%s/%s", ovs_rundir(),
-   "pdump_server_socket");
-fatal_signal_add_file_to_unlink(server_socket_path);
-free(server_socket_path);
 }
 #endif
 
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v15] Improved Packet Drop Statistics in OVS

2019-11-11 Thread Eelco Chaudron



On 8 Nov 2019, at 6:02, Anju Thomas wrote:

Currently OVS maintains explicit packet drop/error counters only on 
port level. Packets that are dropped as part of normal OpenFlow 
processing are counted in flow stats of “drop” flows or as table 
misses in table stats. These can only be interpreted by controllers 
that know the semantics of the configured OpenFlow pipeline.
Without that knowledge, it is impossible for an OVS user to obtain 
e.g. the total number of packets dropped due to OpenFlow rules.


Furthermore, there are numerous other reasons for which packets can be 
dropped by OVS slow path that are not related to the OpenFlow 
pipeline.
The generated datapath flow entries include a drop action to avoid 
further expensive upcalls to the slow path, but subsequent packets 
dropped by the datapath are not accounted anywhere.


Finally, the datapath itself drops packets in certain error 
situations.
Also, these drops are today not accounted for.This makes it difficult 
for OVS users to monitor packet drop in an OVS instance and to alert a 
management system in case of a unexpected increase of such drops.

Also OVS trouble-shooters face difficulties in analysing packet drops.

With this patch we implement following changes to address the issues 
mentioned above.


1. Identify and account all the silent packet drop scenarios

2. Display these drops in ovs-appctl coverage/show

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Keshav Gupta 
Signed-off-by: Anju Thomas 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Keshav Gupta 
Acked-by: Eelco Chaudron 

Not sure what is happening, but your patches show up from “Sriram 
Vatala via dev”




---




diff --git a/lib/odp-util.c b/lib/odp-util.c
index 3a574bf..2f8544b 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -45,6 +45,7 @@
 #include "openvswitch/match.h"
 #include "odp-netlink-macros.h"
 #include "csum.h"
+#include "ofproto/ofproto-dpif-xlate.h"

 VLOG_DEFINE_THIS_MODULE(odp_util);

@@ -141,6 +142,7 @@ odp_action_len(uint16_t type)
 case OVS_ACTION_ATTR_PUSH_NSH: return ATTR_LEN_VARIABLE;
 case OVS_ACTION_ATTR_POP_NSH: return 0;
 case OVS_ACTION_ATTR_CHECK_PKT_LEN: return ATTR_LEN_VARIABLE;
+case OVS_ACTION_ATTR_DROP: return sizeof(uint32_t);

 case OVS_ACTION_ATTR_UNSPEC:
 case __OVS_ACTION_ATTR_MAX:
@@ -1238,6 +1240,9 @@ format_odp_action(struct ds *ds, const struct 
nlattr *a,

 case OVS_ACTION_ATTR_CHECK_PKT_LEN:
 format_odp_check_pkt_len_action(ds, a, portno_names);
 break;
+case OVS_ACTION_ATTR_DROP:
+ds_put_cstr(ds, "drop");
+break;
 case OVS_ACTION_ATTR_UNSPEC:
 case __OVS_ACTION_ATTR_MAX:
 default:
@@ -2573,8 +2578,10 @@ odp_actions_from_string(const char *s, const 
struct simap *port_names,

 struct ofpbuf *actions)
 {
 size_t old_size;
+enum xlate_error drop_action;


Remove the above, as it’s not used. Make sure you use the 
--enable-Werror when doing ./configure as it will catch these errors.



Rest looks good to me…

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] compat: Add compat fix for old kernels

2019-11-11 Thread Roi Dayan



On 2019-11-11 3:39 PM, Simon Horman wrote:
> On Mon, Nov 04, 2019 at 11:49:44AM +0200, Roi Dayan wrote:
>> In kernels older than 4.8, struct tcf_t didn't have the firstuse.
>> If openvswitch is compiled with the compat pkt_cls.h then there is
>> a struct size mismatch between openvswitch and the kernel which cause
>> parsing netlink actions to fail.
>> After this commit parsing the netlink actions pass even if compiled with
>> the compat pkt_cls.h.
>>
>> Signed-off-by: Roi Dayan 
>> ---
>>
>> v1->v2:
>> - fix mix of tabs and spaces in acinclude.m4
> 
> Thanks, applied to master and branch-2.8 ... branch-2.12

thanks!

> 
>>  acinclude.m4| 8 
>>  include/linux/pkt_cls.h | 2 ++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/acinclude.m4 b/acinclude.m4
>> index a0507cfe019e..1e699191bb8b 100644
>> --- a/acinclude.m4
>> +++ b/acinclude.m4
>> @@ -186,6 +186,14 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [
>> [Define to 1 if TCA_FLOWER_KEY_ENC_IP_TTL_MASK is 
>> available.])])
>>  
>>AC_COMPILE_IFELSE([
>> +AC_LANG_PROGRAM([#include ], [
>> +struct tcf_t x;
>> +x.firstuse = 1;
>> +])],
>> +[AC_DEFINE([HAVE_TCF_T_FIRSTUSE], [1],
>> +   [Define to 1 if struct tcf_t have firstuse.])])
>> +
>> +  AC_COMPILE_IFELSE([
>>  AC_LANG_PROGRAM([#include ], [
>>  int x = TCA_VLAN_PUSH_VLAN_PRIORITY;
>>  ])],
>> diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
>> index 4adea59e7c36..b6926a79a0af 100644
>> --- a/include/linux/pkt_cls.h
>> +++ b/include/linux/pkt_cls.h
>> @@ -64,7 +64,9 @@ struct tcf_t {
>>  __u64   install;
>>  __u64   lastuse;
>>  __u64   expires;
>> +#ifdef HAVE_TCF_T_FIRSTUSE
>>  __u64   firstuse;
>> +#endif
>>  };
>>  
>>  #define tc_gen \
>> -- 
>> 2.8.4
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6] netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event

2019-11-11 Thread Eelco Chaudron
Currently, OVS does not register and therefore not handle the
interface reset event from the DPDK framework. This would cause a
problem in cases where a VF is used as an interface, and its
configuration changes.

As an example in the following scenario the MAC change is not
detected/acted upon until OVS is restarted without the patch applied:

  $ echo 1 > /sys/bus/pci/devices/:05:00.1/sriov_numvfs
  $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
set Interface dpdk0 type=dpdk -- \
set Interface dpdk0 options:dpdk-devargs=:05:0a.0

  $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33

Requires patch "bridge: Allow manual notifications about interfaces' updates."

Signed-off-by: Eelco Chaudron 
---
v5 -> v6:
  Moved the if_notifier_manual_report() function till after the
  actual reset of the interface to make it more clear that it's
  supposed to happen AFTER the reset.

  Removed empty line in the code

v4 -> v5:
  Using new if_notifier_manual_report() API

v3 -> v4:
  Add support for if-notification to make sure set_config()
  gets called

v2 -> v3:
v1 -> v2:
  Fixed Ilya's comments

 lib/netdev-dpdk.c |   54 +++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4805783..cb57b96 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -46,6 +46,7 @@
 #include "dpdk.h"
 #include "dpif-netdev.h"
 #include "fatal-signal.h"
+#include "if-notifier.h"
 #include "netdev-provider.h"
 #include "netdev-vport.h"
 #include "odp-util.h"
@@ -396,6 +397,8 @@ struct netdev_dpdk {
 bool attached;
 /* If true, rte_eth_dev_start() was successfully called */
 bool started;
+bool reset_needed;
+/* 1 pad byte here. */
 struct eth_addr hwaddr;
 int mtu;
 int socket_id;
@@ -1173,6 +1176,8 @@ common_construct(struct netdev *netdev, dpdk_port_t 
port_no,
 ovsrcu_index_init(&dev->vid, -1);
 dev->vhost_reconfigured = false;
 dev->attached = false;
+dev->started = false;
+dev->reset_needed = false;
 
 ovsrcu_init(&dev->qos_conf, NULL);
 
@@ -1769,6 +1774,34 @@ netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
 return new_port_id;
 }
 
+static int
+dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type,
+void *param OVS_UNUSED, void *ret_param OVS_UNUSED)
+{
+struct netdev_dpdk *dev;
+
+switch ((int) type) {
+case RTE_ETH_EVENT_INTR_RESET:
+ovs_mutex_lock(&dpdk_mutex);
+dev = netdev_dpdk_lookup_by_port_id(port_id);
+if (dev) {
+ovs_mutex_lock(&dev->mutex);
+dev->reset_needed = true;
+netdev_request_reconfigure(&dev->up);
+VLOG_DBG_RL(&rl, "%s: Device reset requested.",
+netdev_get_name(&dev->up));
+ovs_mutex_unlock(&dev->mutex);
+}
+ovs_mutex_unlock(&dpdk_mutex);
+break;
+
+default:
+/* Ignore all other types. */
+break;
+   }
+   return 0;
+}
+
 static void
 dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
 OVS_REQUIRES(dev->mutex)
@@ -3807,6 +3840,8 @@ netdev_dpdk_class_init(void)
 /* This function can be called for different classes.  The initialization
  * needs to be done only once */
 if (ovsthread_once_start(&once)) {
+int ret;
+
 ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
 unixctl_command_register("netdev-dpdk/set-admin-state",
  "[netdev] up|down", 1, 2,
@@ -3820,6 +3855,14 @@ netdev_dpdk_class_init(void)
  "[netdev]", 0, 1,
  netdev_dpdk_get_mempool_info, NULL);
 
+ret = rte_eth_dev_callback_register(RTE_ETH_ALL,
+RTE_ETH_EVENT_INTR_RESET,
+dpdk_eth_event_callback, NULL);
+if (ret != 0) {
+VLOG_ERR("Ethernet device callback register error: %s",
+ rte_strerror(-ret));
+}
+
 ovsthread_once_done(&once);
 }
 
@@ -4180,13 +4223,20 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
 && dev->rxq_size == dev->requested_rxq_size
 && dev->txq_size == dev->requested_txq_size
 && dev->socket_id == dev->requested_socket_id
-&& dev->started) {
+&& dev->started && !dev->reset_needed) {
 /* Reconfiguration is unnecessary */
 
 goto out;
 }
 
-rte_eth_dev_stop(dev->port_id);
+if (dev->reset_needed) {
+rte_eth_dev_reset(dev->port_id);
+if_notifier_manual_report();
+dev->reset_needed = false;
+} else {
+rte_eth_dev_stop(dev->port_id);
+}
+
 dev->started = false;
 
 err = netdev_dpdk_mempool_configure(dev);

___
dev mailing list
d...@openvsw

Re: [ovs-dev] [PATCH v5] netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event

2019-11-11 Thread Eelco Chaudron



On 7 Nov 2019, at 12:50, Ilya Maximets wrote:


On 06.11.2019 14:43, Eelco Chaudron wrote:

Currently, OVS does not register and therefore not handle the
interface reset event from the DPDK framework. This would cause a
problem in cases where a VF is used as an interface, and its
configuration changes.

As an example in the following scenario the MAC change is not
detected/acted upon until OVS is restarted without the patch applied:

   $ echo 1 > /sys/bus/pci/devices/:05:00.1/sriov_numvfs
   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
 set Interface dpdk0 type=dpdk -- \
 set Interface dpdk0 options:dpdk-devargs=:05:0a.0

   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33

Requires patch "bridge: Allow manual notifications about interfaces' 
updates."


Signed-off-by: Eelco Chaudron 
---
v4 -> v5:
   Using new if_notifier_manual_report() API

v3 -> v4:
   Add support for if-notification to make sure set_config()
   gets called

v2 -> v3:
v1 -> v2:
   Fixed Ilya's comments

lib/netdev-dpdk.c |   56 
+++--

  1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4805783..b744589 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -46,6 +46,7 @@
  #include "dpdk.h"
  #include "dpif-netdev.h"
  #include "fatal-signal.h"
+#include "if-notifier.h"
  #include "netdev-provider.h"
  #include "netdev-vport.h"
  #include "odp-util.h"
@@ -396,6 +397,8 @@ struct netdev_dpdk {
  bool attached;
  /* If true, rte_eth_dev_start() was successfully called */
  bool started;
+bool reset_needed;
+/* 1 pad byte here. */
  struct eth_addr hwaddr;
  int mtu;
  int socket_id;
@@ -1173,6 +1176,8 @@ common_construct(struct netdev *netdev, 
dpdk_port_t port_no,

  ovsrcu_index_init(&dev->vid, -1);
  dev->vhost_reconfigured = false;
  dev->attached = false;
+dev->started = false;
+dev->reset_needed = false;
   ovsrcu_init(&dev->qos_conf, NULL);
 @@ -1769,6 +1774,36 @@ netdev_dpdk_process_devargs(struct 
netdev_dpdk *dev,

  return new_port_id;
  }
 +static int
+dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type 
type,
+void *param OVS_UNUSED, void *ret_param 
OVS_UNUSED)

+{
+struct netdev_dpdk *dev;
+
+switch ((int) type) {
+case RTE_ETH_EVENT_INTR_RESET:
+ovs_mutex_lock(&dpdk_mutex);
+dev = netdev_dpdk_lookup_by_port_id(port_id);
+if (dev) {
+ovs_mutex_lock(&dev->mutex);
+dev->reset_needed = true;
+netdev_request_reconfigure(&dev->up);
+VLOG_DBG_RL(&rl, "%s: Device reset requested.",
+netdev_get_name(&dev->up));
+ovs_mutex_unlock(&dev->mutex);
+}
+ovs_mutex_unlock(&dpdk_mutex);
+
+if_notifier_manual_report();
+break;
+
+default:
+/* Ignore all other types. */
+break;
+   }
+   return 0;
+}
+
  static void
  dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap 
*args)

  OVS_REQUIRES(dev->mutex)
@@ -3807,6 +3842,8 @@ netdev_dpdk_class_init(void)
  /* This function can be called for different classes.  The 
initialization

   * needs to be done only once */
  if (ovsthread_once_start(&once)) {
+int ret;
+
  ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
  unixctl_command_register("netdev-dpdk/set-admin-state",
   "[netdev] up|down", 1, 2,
@@ -3820,6 +3857,15 @@ netdev_dpdk_class_init(void)
   "[netdev]", 0, 1,
   netdev_dpdk_get_mempool_info, 
NULL);

 +ret = rte_eth_dev_callback_register(RTE_ETH_ALL,
+
RTE_ETH_EVENT_INTR_RESET,
+dpdk_eth_event_callback, 
NULL);

+


This empty line could be removed.  There is already enough empty 
space.



Will do…


+if (ret != 0) {
+VLOG_ERR("Ethernet device callback register error: %s",
+ rte_strerror(-ret));
+}
+
  ovsthread_once_done(&once);
  }
 @@ -4180,13 +4226,19 @@ netdev_dpdk_reconfigure(struct netdev 
*netdev)

  && dev->rxq_size == dev->requested_rxq_size
  && dev->txq_size == dev->requested_txq_size
  && dev->socket_id == dev->requested_socket_id
-&& dev->started) {
+&& dev->started && !dev->reset_needed) {
  /* Reconfiguration is unnecessary */
   goto out;
  }
 -rte_eth_dev_stop(dev->port_id);
+if (dev->reset_needed) {
+rte_eth_dev_reset(dev->port_id);


Thinking more about the configuration sequence it seems that call
if_notifier_manual_report() should be here and not in the callback
to be sure that settings will be applied after reset.


I did verify the order wou

Re: [ovs-dev] [PATCH v2] compat: Add compat fix for old kernels

2019-11-11 Thread Simon Horman
On Mon, Nov 04, 2019 at 11:49:44AM +0200, Roi Dayan wrote:
> In kernels older than 4.8, struct tcf_t didn't have the firstuse.
> If openvswitch is compiled with the compat pkt_cls.h then there is
> a struct size mismatch between openvswitch and the kernel which cause
> parsing netlink actions to fail.
> After this commit parsing the netlink actions pass even if compiled with
> the compat pkt_cls.h.
> 
> Signed-off-by: Roi Dayan 
> ---
> 
> v1->v2:
> - fix mix of tabs and spaces in acinclude.m4

Thanks, applied to master and branch-2.8 ... branch-2.12

>  acinclude.m4| 8 
>  include/linux/pkt_cls.h | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/acinclude.m4 b/acinclude.m4
> index a0507cfe019e..1e699191bb8b 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -186,6 +186,14 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [
> [Define to 1 if TCA_FLOWER_KEY_ENC_IP_TTL_MASK is 
> available.])])
>  
>AC_COMPILE_IFELSE([
> +AC_LANG_PROGRAM([#include ], [
> +struct tcf_t x;
> +x.firstuse = 1;
> +])],
> +[AC_DEFINE([HAVE_TCF_T_FIRSTUSE], [1],
> +   [Define to 1 if struct tcf_t have firstuse.])])
> +
> +  AC_COMPILE_IFELSE([
>  AC_LANG_PROGRAM([#include ], [
>  int x = TCA_VLAN_PUSH_VLAN_PRIORITY;
>  ])],
> diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
> index 4adea59e7c36..b6926a79a0af 100644
> --- a/include/linux/pkt_cls.h
> +++ b/include/linux/pkt_cls.h
> @@ -64,7 +64,9 @@ struct tcf_t {
>   __u64   install;
>   __u64   lastuse;
>   __u64   expires;
> +#ifdef HAVE_TCF_T_FIRSTUSE
>   __u64   firstuse;
> +#endif
>  };
>  
>  #define tc_gen \
> -- 
> 2.8.4
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v6] ovn-northd: Limit ARP/ND broadcast domain whenever possible.

2019-11-11 Thread Dumitru Ceara
On Sat, Nov 9, 2019 at 8:35 AM Han Zhou  wrote:
>
>
>
> On Fri, Nov 8, 2019 at 6:38 AM Dumitru Ceara  wrote:
> >
> > ARP request and ND NS packets for router owned IPs were being
> > flooded in the complete L2 domain (using the MC_FLOOD multicast group).
> > However this creates a scaling issue in scenarios where aggregation
> > logical switches are connected to more logical routers (~350). The
> > logical pipelines of all routers would have to be executed before the
> > packet is finally replied to by a single router, the owner of the IP
> > address.
> >
> > This commit limits the broadcast domain by bypassing the L2 Lookup stage
> > for ARP requests that will be replied by a single router. The packets
> > are forwarded only to the router port that owns the target IP address.
> >
> > IPs that are owned by the routers and for which this fix applies are:
> > - IP addresses configured on the router ports.
> > - VIPs.
> > - NAT IPs.
> >
> > This commit also fixes function get_router_load_balancer_ips() which
> > was incorrectly returning a single address_family even though the
> > IP set could contain a mix of IPv4 and IPv6 addresses.
> >
> > Reported-at: https://bugzilla.redhat.com/1756945
> > Reported-by: Anil Venkata 
> > Signed-off-by: Dumitru Ceara 
> >
> > ---
> > v6:
> > - Address Han's comments:
> >   - remove flooding of ARPs targeting OVN owned IP addresses.
> >   - update ovn-architecture documentation.
> >   - rename ARP handling functions.
> >   - Adapt "ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR" autotest to take into
> > account the new way of forwarding ARPs.
> > - Also, properly deal with ARP packets on VLAN-backed networks.
>
> I am confused by this additional VLAN related change. VLAN is just handled as 
> bridged logical switch where a localnet port is used as *inport*. It seems to 
> me no difference from regular localnet port no matter with or without VLAN 
> tag. When an ARP request is going through the ingress pipeline of the bridged 
> logical switch, the logic of bypassing the flooding added by this patch 
> should just apply, right? It is also very common scenario that the 
> *aggregation switch* for the routers is an external physical network backed 
> by VLAN. I think the purpose of this patch is to make sure scenario scale. 
> Did I misunderstand anything here?

Hi Han,

The reason behind it was that with VLAN backed networks when packets
move between hypervisors without going through geneve we rerun the
ingress pipeline. That would mean that the flows I added for self
originated (G)ARP packets won't be hit for ARP requests originated by
ovn-controller on a remote hypervisor:

priority=80 match: "inport=, arp.op == 1" action:
"outport=MC_FLOOD"

But instead we would hit:
priority=75 match: "arp.op == 1 && arp.tpa == " and never send flood the packet out on
the second HV.

You're right, the way I added the check for all VLAN packets is not OK
as we fall back to the old behavior too often. However, I'm not sure
what the best option is. Do you think it's fine if I change the
priority 80 flow to the following?

priority=80 match: "eth.src={lrp.ea, lr.nat.external_macs} && arp.op
== 1" action: "outport=MC_FLOOD"

The idea would be to identify self originated ARP requests by matching
the source mac instead of logical ingress port (as this might not be
present). I tried it locally and it works fine but we do need to add
more flows than before.

Thanks,
Dumitru

>
> In addition, the below macro definition may be renamed to FLAGBIT_..., 
> because it is for the bits of MFF_LOG_FLAGS, which is one of the pre-defined 
> logical fields, instead of for the MFF_LOG_REG0-9 registers.
> >
> > +#define REGBIT_NOT_VXLAN "flags[1] == 0"
> > +#define REGBIT_NOT_VLAN "flags[7] == 0"
> > +
>
> The other part looks good to me. Thanks for simply the patch.
>
> Han

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: add hash info to upcall

2019-11-11 Thread Tonghao Zhang
On Mon, Nov 11, 2019 at 9:07 PM Simon Horman  wrote:
>
> On Sun, Nov 10, 2019 at 07:44:18PM +0800, xiangxia.m@gmail.com wrote:
> > From: Tonghao Zhang 
> >
> > When using the kernel datapath, the upcall don't
> > add skb hash info relatived. That will introduce
> > some problem, because the hash of skb is very
> > important (e.g. vxlan module uses it for udp src port,
> > tx queue selection on tx path.).
> >
> > For example, there will be one upcall, without information
> > skb hash, to ovs-vswitchd, for the first packet of one tcp
> > session. When kernel sents the tcp packets, the hash is
> > random for a tcp socket:
> >
> > tcp_v4_connect
> >   -> sk_set_txhash (is random)
> >
> > __tcp_transmit_skb
> >   -> skb_set_hash_from_sk
> >
> > Then the udp src port of first tcp packet is different
> > from rest packets. The topo is shown.
> >
> > $ ovs-vsctl add-br br-int
> > $ ovs-vsctl add-port br-int vxl0 -- \
> >   set Interface vxl0 type=vxlan options:key=100 
> > options:remote_ip=1.1.1.200
> >
> > $ __tap is internal type on host
> > $ or tap net device for VM/Dockers
> > $ ovs-vsctl add-port br-int __tap
> >
> > +---+  +-+
> > |   Docker/VMs  |  | ovs-vswitchd|
> > ++--+  +-+
> >  |   ^|
> >  |   ||
> >  |   |  upcallv recalculate packet hash
> >  | +-++--+
> >  |  tap netdev | |   vxlan modules
> >  +---> +-->  Open vSwitch ko   --+--->
> >internal type   | |
> >+-+
>
> I think I see the problem that you are trying to solve, but this approach
> feels wrong to me. In my view the HASH is transparent to components
> outside of the datapath (in this case the Open vSwitch ko box).
The hash affects the vxlan modules to select the udp src port and tx
queue selection.

> For one thing, with this change ovs-vswitchd can now supply any hash
> value it likes.
the patch for ovs-vswitchd is not sent for now, this patch will get
the hash from upcall
and sent it back to kernel.
> Is it not possible to fix things so that "recalculate packet hash"
> in fact recalculates the same hash value as was calculated before
> the upcall?
Hi, Simon
I don't get a better solution, because the hash calculated with
different way, for example
hash is random for tcp which may come from host or VMs, and hash may
is calculated in hw, software.

> >
> > Reported-at: 
> > https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
> > Signed-off-by: Tonghao Zhang 
> > ---
> >  include/uapi/linux/openvswitch.h |  2 ++
> >  net/openvswitch/datapath.c   | 31 ++-
> >  net/openvswitch/datapath.h   |  3 +++
> >  3 files changed, 35 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/openvswitch.h 
> > b/include/uapi/linux/openvswitch.h
> > index 1887a451c388..1c58e019438e 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -170,6 +170,7 @@ enum ovs_packet_cmd {
> >   * output port is actually a tunnel port. Contains the output tunnel key
> >   * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
> >   * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
> > + * @OVS_PACKET_ATTR_HASH: Packet hash info (e.g. hash, sw_hash and l4_hash 
> > in skb)
> >   * @OVS_PACKET_ATTR_LEN: Packet size before truncation.
> >   * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
> >   * size.
> > @@ -190,6 +191,7 @@ enum ovs_packet_attr {
> >   OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
> >  error logging should be suppressed. */
> >   OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
> > + OVS_PACKET_ATTR_HASH,   /* Packet hash. */
> >   OVS_PACKET_ATTR_LEN,/* Packet size before truncation. */
> >   __OVS_PACKET_ATTR_MAX
> >  };
> > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> > index 2088619c03f0..f938c43e3085 100644
> > --- a/net/openvswitch/datapath.c
> > +++ b/net/openvswitch/datapath.c
> > @@ -350,7 +350,8 @@ static size_t upcall_msg_size(const struct 
> > dp_upcall_info *upcall_info,
> >   size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
> >   + nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
> >   + nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY 
> > */
> > - + nla_total_size(sizeof(unsigned int)); /* 
> > OVS_PACKET_ATTR_LEN */
> > + + nla_total_size(sizeof(unsigned int)) /* OVS_PACKET_ATTR_LEN 
> > */
> > + + nla_total_size(sizeof(u64)); 

Re: [ovs-dev] [dpdk-latest PATCH v2] netdev-dpdk: add coverage counter to count vhost IRQs

2019-11-11 Thread Eelco Chaudron




On 25 Oct 2019, at 15:56, Eelco Chaudron wrote:


When the dpdk vhost library executes an eventfd_write() call,
i.e. waking up the guest, a new callback will be called.

This patch adds the callback to count the number of
interrupts sent to the VM to track the number of times
interrupts where generated.

This might be of interest to find out system-calls were
called in the DPDK fast path.

The coverage counter is called "dpdk_vhost_irqs" and can be
read with:

  $ ovs-appctl coverage/show | grep dpdk_vhost_irqs
  dpdk_vhost_irqs  275880.6/sec 129962.683/sec 
6561.7250/sec   total: 23684319


  $ ovs-appctl coverage/read-counter dpdk_vhost_irqs
  23684319

Signed-off-by: Eelco Chaudron 


Any comments on this patchset?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: add hash info to upcall

2019-11-11 Thread Simon Horman
On Sun, Nov 10, 2019 at 07:44:18PM +0800, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> When using the kernel datapath, the upcall don't
> add skb hash info relatived. That will introduce
> some problem, because the hash of skb is very
> important (e.g. vxlan module uses it for udp src port,
> tx queue selection on tx path.).
> 
> For example, there will be one upcall, without information
> skb hash, to ovs-vswitchd, for the first packet of one tcp
> session. When kernel sents the tcp packets, the hash is
> random for a tcp socket:
> 
> tcp_v4_connect
>   -> sk_set_txhash (is random)
> 
> __tcp_transmit_skb
>   -> skb_set_hash_from_sk
> 
> Then the udp src port of first tcp packet is different
> from rest packets. The topo is shown.
> 
> $ ovs-vsctl add-br br-int
> $ ovs-vsctl add-port br-int vxl0 -- \
>   set Interface vxl0 type=vxlan options:key=100 
> options:remote_ip=1.1.1.200
> 
> $ __tap is internal type on host
> $ or tap net device for VM/Dockers
> $ ovs-vsctl add-port br-int __tap
> 
> +---+  +-+
> |   Docker/VMs  |  | ovs-vswitchd|
> ++--+  +-+
>  |   ^|
>  |   ||
>  |   |  upcallv recalculate packet hash
>  | +-++--+
>  |  tap netdev | |   vxlan modules
>  +---> +-->  Open vSwitch ko   --+--->
>internal type   | |
>+-+

I think I see the problem that you are trying to solve, but this approach
feels wrong to me. In my view the HASH is transparent to components
outside of the datapath (in this case the Open vSwitch ko box).

For one thing, with this change ovs-vswitchd can now supply any hash
value it likes.

Is it not possible to fix things so that "recalculate packet hash"
in fact recalculates the same hash value as was calculated before
the upcall?

> 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
> Signed-off-by: Tonghao Zhang 
> ---
>  include/uapi/linux/openvswitch.h |  2 ++
>  net/openvswitch/datapath.c   | 31 ++-
>  net/openvswitch/datapath.h   |  3 +++
>  3 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 1887a451c388..1c58e019438e 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -170,6 +170,7 @@ enum ovs_packet_cmd {
>   * output port is actually a tunnel port. Contains the output tunnel key
>   * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
>   * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
> + * @OVS_PACKET_ATTR_HASH: Packet hash info (e.g. hash, sw_hash and l4_hash 
> in skb)
>   * @OVS_PACKET_ATTR_LEN: Packet size before truncation.
>   * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
>   * size.
> @@ -190,6 +191,7 @@ enum ovs_packet_attr {
>   OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
>  error logging should be suppressed. */
>   OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
> + OVS_PACKET_ATTR_HASH,   /* Packet hash. */
>   OVS_PACKET_ATTR_LEN,/* Packet size before truncation. */
>   __OVS_PACKET_ATTR_MAX
>  };
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 2088619c03f0..f938c43e3085 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -350,7 +350,8 @@ static size_t upcall_msg_size(const struct dp_upcall_info 
> *upcall_info,
>   size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
>   + nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
>   + nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY */
> - + nla_total_size(sizeof(unsigned int)); /* OVS_PACKET_ATTR_LEN 
> */
> + + nla_total_size(sizeof(unsigned int)) /* OVS_PACKET_ATTR_LEN */
> + + nla_total_size(sizeof(u64)); /* OVS_PACKET_ATTR_HASH */
>  
>   /* OVS_PACKET_ATTR_USERDATA */
>   if (upcall_info->userdata)
> @@ -393,6 +394,7 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
>   size_t len;
>   unsigned int hlen;
>   int err, dp_ifindex;
> + u64 hash;
>  
>   dp_ifindex = get_dpifindex(dp);
>   if (!dp_ifindex)
> @@ -504,6 +506,24 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
>   pad_packet(dp, user_skb);
>   }
>  
> + if (skb_get_hash_raw(skb)) {
> + hash = skb_get_hash_raw(skb);
> +
> + if (skb->sw_hash)
> +

Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case multi-segments.

2019-11-11 Thread Ilya Maximets
> Hi Darrell,
> 
> In TSO case, the packet may use multi-segments mbuf, and I do not> think we 
> need to make it linearal. In this case, we can NOT use
> pointer to calculate the tcp length.

Hi.  Just wanted to mention that current main course for TSO enabling in
OVS seems to be using of extbuf memory in vhost.  And corresponding patches
are already accepted [1] to DPDK.  So, there will be no multi-segment mbufs.

[1] c3ff0ac70acb ("vhost: improve performance by supporting large buffer")

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

2019-11-11 Thread Sriram Vatala via dev
Hi Ilya,
Thanks for the review. I agree with your proposal to move the stats update 
code to existing special functions. Thanks for the incremental patch, it looks 
good to me. Will wait for Kevin's taught on this.

Thanks & Regards,
Sriram.

-Original Message-
From: Ilya Maximets 
Sent: 11 November 2019 04:50
To: Sriram Vatala ; ovs-dev@openvswitch.org; 
ktray...@redhat.com; i.maxim...@ovn.org
Subject: Re: [PATCH v11 2/3] netdev-dpdk : Detailed packet drop statistics

On 29.10.2019 15:50, Sriram Vatala wrote:> @@ -2388,12 +2412,16 @@ 
__netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>   }
>   } while (cnt && (retries++ < max_retries));
>
> +tx_failure = cnt;
>   rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
>
>   rte_spinlock_lock(&dev->stats_lock);
>   netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
>cnt + dropped);
> -dev->tx_retries += MIN(retries, max_retries);
> +sw_stats->tx_retries += MIN(retries, max_retries);
> +sw_stats->tx_failure_drops += tx_failure;
> +sw_stats->tx_mtu_exceeded_drops += mtu_drops;
> +sw_stats->tx_qos_drops += qos_drops;

Kevin pointed to this part of code in hope that we can move this to a separate 
function and reuse in his review for v9.  This code catches my eyes too.  I 
don't think that we can reuse this part, at least this will be not very 
efficient in current situation (clearing of the unused fields in a stats 
structure will be probably needed before calling such a common function, but 
ETH tx uses only half of the struct).

But there is another thing here.  We already have special functions for vhost 
tx/rx counters.  And it looks strange when we're not using these functions to 
update tx/rx failure counters.

Suggesting following incremental.
Kevin, Sriram, please, share your thoughts.

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 3cb7023a8..02120a379 
100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2169,16 +2169,18 @@ netdev_dpdk_vhost_update_rx_size_counters(struct 
netdev_stats *stats,
  }

  static inline void
-netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats,
+netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev,
   struct dp_packet **packets, int count,
- int dropped)
+ int qos_drops)
  {
-int i;
-unsigned int packet_size;
+struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
+struct netdev_stats *stats = &dev->stats;
  struct dp_packet *packet;
+unsigned int packet_size;
+int i;

  stats->rx_packets += count;
-stats->rx_dropped += dropped;
+stats->rx_dropped += qos_drops;
  for (i = 0; i < count; i++) {
  packet = packets[i];
  packet_size = dp_packet_size(packet); @@ -2201,6 +2203,8 @@ 
netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats,

  stats->rx_bytes += packet_size;
  }
+
+sw_stats->rx_qos_drops += qos_drops;
  }

  /*
@@ -2213,7 +2217,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
  struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
  struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev);
  uint16_t nb_rx = 0;
-uint16_t dropped = 0;
+uint16_t qos_drops = 0;
  int qid = rxq->queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
  int vid = netdev_dpdk_get_vid(dev);

@@ -2240,17 +2244,16 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
  }

  if (policer) {
-dropped = nb_rx;
+qos_drops = nb_rx;
  nb_rx = ingress_policer_run(policer,
  (struct rte_mbuf **) batch->packets,
  nb_rx, true);
-dropped -= nb_rx;
+qos_drops -= nb_rx;
  }

  rte_spinlock_lock(&dev->stats_lock);
-netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
- nb_rx, dropped);
-dev->sw_stats->rx_qos_drops += dropped;
+netdev_dpdk_vhost_update_rx_counters(dev, batch->packets,
+ nb_rx, qos_drops);
  rte_spinlock_unlock(&dev->stats_lock);

  batch->count = nb_rx;
@@ -2360,13 +2363,18 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, 
struct rte_mbuf **pkts,
  }

  static inline void
-netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
+netdev_dpdk_vhost_update_tx_counters(struct netdev_dpdk *dev,
   struct dp_packet **packets,
   int attempted,
- int dropped)
+ struct netdev_dpdk_sw_stats
+ *sw_stats_add)
  {
-int i;
+struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
+int dropped = sw_stats_add->tx_mtu_exceeded_drops +
+  sw_stats_add->tx_qos_drops