[ovs-discuss] Re: Re:Re: [HELP] Question about icmp pkt marked Invalid by userspace conntrack

2019-11-04 Thread txfh2007 via discuss
Hi Darrell:
The meter rate limit is set as 1Gbps, but the actual rate is around 
500Mbps.. I have read the meter patch, but this patch is to prevent delta_t 
changed to 0. But in my case, the delta_t is around 35500ms.
For my case, the meter action is on openflow table 46, and the ct action is on 
table 44, the output action is on table 65, so I guess the order is right?

Thanks 

Timo 



--
:Darrell Ball 
:2019年11月5日(星期二) 06:56
:txfh2007 
:Ben Pfaff ; ovs-discuss 
:Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by 
userspace conntrack


Hi Timo

On Sun, Nov 3, 2019 at 5:12 PM txfh2007  wrote:

Hi Darrell:
Sorry for my late reply. Yes, the two VMs under test are on same compute 
node , and pkts rx/tx via vhost user type port.

Got it
 

Firstly if I don't configure meter table, then Iperf TCP bandwidth result From 
VM1 to VM2 is around 5Gbps, then I set the meter entry and constraint the rate, 
and the deviation is larger than I throught.


IIUC, pre-meter, you get 5 Gbps, then post-meter 0.5 Gpbs, which is less than 
you expected ?
What did you expect the metered rate to be ?
Note Ben pointed you to a meter related bug fix on the alias b4.
 
I guess the recalculation of l4 checksum during conntrack would impact the 
actual rate?


are you applying the meter rule at end of the complete pipeline ?
 

Thank you 
Timo 




txfh2007 
Ben Pfaff ; ovs-discuss 
Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by 
userspace conntrack


Hi Timo


I read thru this thread to get more context on what you are doing; you have a 
base OVS-DPDK
use case and are measuring VM to VM performance across 2 compute nodes. You are 
probably using
vhost-user-client ports ? Pls correct me if I am wrong.
In this case, "per direction" you have one rx virtual interface to handle in 
OVS; there will be a tradeoff b/w
checksum validation security and performance.
JTBC, in terms of your measurements, how did you arrive at the 5Gbps - 
instrumented code or otherwise ?.
(I can verify that later when I have a setup).


Darrell










On Thu, Oct 31, 2019 at 9:23 AM Darrell Ball  wrote:




On Thu, Oct 31, 2019 at 3:04 AM txfh2007 via discuss 
 wrote:

Hi Ben && Darrell:
 This patch works, but after merging this patch I have found the iperf 
throughout decrease from 5Gbps+ to 500Mbps.

what is the 5Gbps number ? Is that the number with marking all packets as 
invalid in initial sanity checks ?


Typically one wants to offload checksum checks. The code checks whether that 
has been done and skips
doing it in software; can you verify that you have the capability and are using 
it ?


Skipping checksum checks reduces security, of course, but it can be added if 
there is a common case of
not being able to offload checksumming. 



 I guess maybe we should add a switch to turn off layer4 checksum validation 
when doing userspace conntrack ? I have found for kernel conntrack, there is a 
related button named "nf_conntrack_checksum"  .

Any advice?

Thank you !

--

:Ben Pfaff 
:ovs-discuss 
:Re:Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by userspace 
conntrack


Hi Ben && Darrell:
 Thanks, this patch works! Now the issue seems fixed 

Timo


Re: Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by userspace 
conntrack


I see.

It sounds like Darrell pointed out the solution, but please let me know
if it did not help.

On Fri, Oct 11, 2019 at 08:57:58AM +0800, txfh2007 wrote:
> Hi Ben:
> 
>  I just found the GCC_UNALIGNED_ACCESSORS error during gdb trace and not 
> sure this is a misaligned error or others. What I can confirm is  during 
> "extract_l4" of this icmp reply packet, when we do "check_l4_icmp", the 
> unaligned error emits and the "extract_l4" returned false. So this packet be 
> marked as ct_state=invalid.
> 
> Thank you for your help.
> 
> Timo
> 
> Topic:Re: [ovs-discuss] [HELP] Question about icmp pkt marked Invalid by 
> userspace conntrack
> 
> 
> It's very surprising.
> 
> Are you using a RISC architecture that insists on aligned accesses?  On
> the other hand, if you are using x86-64 or some other architecture that
> ordinarily does not care, are you sure that this is about a misaligned
> access (it is more likely to simply be a bad pointer)?
> 
> On Thu, Oct 10, 2019 at 10:50:33PM +0800, txfh2007 via discuss wrote:
> > 
> > Hi all:
> > I was using OVS-DPDK(version 2.10-1), and I have found pinging between 
> > two VMs on different compute nodes failed. I have checked my env and found 
> > there is one node's NIC cannot strip CRC of a frame, the other node's NIC 
> > is normal(I mean it can strip CRC ). And the reason of ping fail is the 
> > icmp reply pkt (from node whose NIC cannot strip CRC) is marked as invalid 
> > . So the icmp request From Node A is 64 bytes, but the icmp reply From 

Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by userspace conntrack

2019-11-04 Thread Darrell Ball
Hi Timo

On Sun, Nov 3, 2019 at 5:12 PM txfh2007  wrote:

> Hi Darrell:
> Sorry for my late reply. Yes, the two VMs under test are on same
> compute node , and pkts rx/tx via vhost user type port.


Got it


> Firstly if I don't configure meter table, then Iperf TCP bandwidth result
> From VM1 to VM2 is around 5Gbps, then I set the meter entry and constraint
> the rate, and the deviation is larger than I throught.
>

IIUC, pre-meter, you get 5 Gbps, then post-meter 0.5 Gpbs, which is less
than you expected ?
What did you expect the metered rate to be ?
Note Ben pointed you to a meter related bug fix on the alias b4.


> I guess the recalculation of l4 checksum during conntrack would impact
> the actual rate?
>

are you applying the meter rule at end of the complete pipeline ?


>
> Thank you
> Timo
>
>
>
>
> txfh2007 
> Ben Pfaff ; ovs-discuss 
> Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> Hi Timo
>
>
> I read thru this thread to get more context on what you are doing; you
> have a base OVS-DPDK
> use case and are measuring VM to VM performance across 2 compute nodes.
> You are probably using
> vhost-user-client ports ? Pls correct me if I am wrong.
> In this case, "per direction" you have one rx virtual interface to handle
> in OVS; there will be a tradeoff b/w
> checksum validation security and performance.
> JTBC, in terms of your measurements, how did you arrive at the 5Gbps -
> instrumented code or otherwise ?.
> (I can verify that later when I have a setup).
>
>
> Darrell
>
>
>
>
>
>
>
>
>
>
> On Thu, Oct 31, 2019 at 9:23 AM Darrell Ball  wrote:
>
>
>
>
> On Thu, Oct 31, 2019 at 3:04 AM txfh2007 via discuss <
> ovs-discuss@openvswitch.org> wrote:
>
> Hi Ben && Darrell:
>  This patch works, but after merging this patch I have found the iperf
> throughout decrease from 5Gbps+ to 500Mbps.
>
> what is the 5Gbps number ? Is that the number with marking all packets as
> invalid in initial sanity checks ?
>
>
> Typically one wants to offload checksum checks. The code checks whether
> that has been done and skips
> doing it in software; can you verify that you have the capability and are
> using it ?
>
>
> Skipping checksum checks reduces security, of course, but it can be added
> if there is a common case of
> not being able to offload checksumming.
>
>
>
>  I guess maybe we should add a switch to turn off layer4 checksum
> validation when doing userspace conntrack ? I have found for kernel
> conntrack, there is a related button named "nf_conntrack_checksum"  .
>
> Any advice?
>
> Thank you !
>
> --
>
> :Ben Pfaff 
> :ovs-discuss 
> :Re:Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> Hi Ben && Darrell:
>  Thanks, this patch works! Now the issue seems fixed
>
> Timo
>
>
> Re: Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> I see.
>
> It sounds like Darrell pointed out the solution, but please let me know
> if it did not help.
>
> On Fri, Oct 11, 2019 at 08:57:58AM +0800, txfh2007 wrote:
> > Hi Ben:
> >
> >  I just found the GCC_UNALIGNED_ACCESSORS error during gdb trace and
> not sure this is a misaligned error or others. What I can confirm is
> during "extract_l4" of this icmp reply packet, when we do "check_l4_icmp",
> the unaligned error emits and the "extract_l4" returned false. So this
> packet be marked as ct_state=invalid.
> >
> > Thank you for your help.
> >
> > Timo
> >
> > Topic:Re: [ovs-discuss] [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
> >
> >
> > It's very surprising.
> >
> > Are you using a RISC architecture that insists on aligned accesses?  On
> > the other hand, if you are using x86-64 or some other architecture that
> > ordinarily does not care, are you sure that this is about a misaligned
> > access (it is more likely to simply be a bad pointer)?
> >
> > On Thu, Oct 10, 2019 at 10:50:33PM +0800, txfh2007 via discuss wrote:
> > >
> > > Hi all:
> > > I was using OVS-DPDK(version 2.10-1), and I have found pinging
> between two VMs on different compute nodes failed. I have checked my env
> and found there is one node's NIC cannot strip CRC of a frame, the other
> node's NIC is normal(I mean it can strip CRC ). And the reason of ping fail
> is the icmp reply pkt (from node whose NIC cannot strip CRC) is marked as
> invalid . So the icmp request From Node A is 64 bytes, but the icmp reply
> From Node B is 68 bytes(with 4 bytes CRC). And when doing "check_l4_icmp",
> when we call csum task(in lib/csum.c). Gcc emits unaligned accessor error.
> The backtrace is as below:
> > >
> > > I just want to confirm if this phenomenon is reasonable?
> > >
> > > Many thanks
> > >
> > > Timo
> > >
> > >
> > > get_unaligned_be16 (p=0x7f2ad0b1ed5c) at lib/unaligned.h:89
> > > 89 GCC_UNALIGNED_ACCESSORS(ovs_be16, be16);
> > > (gdb) bt
> > > #0  

Re: [ovs-discuss] OVS DPDK: Failed to create memory pool for netdev

2019-11-04 Thread Tobias Hofmann (tohofman) via discuss
Hi Flavio,

thanks for reaching out.

The DPDK options used in OvS are:

other_config:pmd-cpu-mask=0x202
other_config:dpdk-socket-mem=1024
other_config:dpdk-init=true


For the dpdk port, we set:

type=dpdk
options:dpdk-devargs=:08:0b.2
external_ids:unused-drv=i40evf 
mtu_request=9216


Please let me know if this is what you asked for.

Thanks
Tobias

On 04.11.19, 15:50, "Flavio Leitner"  wrote:


It would be nice if you share the DPDK options used in OvS.

On Sat, 2 Nov 2019 15:43:18 +
"Tobias Hofmann \(tohofman\) via discuss" 
wrote:

> Hello community,
> 
> My team and I observe a strange behavior on our system with the
> creation of dpdk ports in OVS. We have a CentOS 7 system with
> OpenvSwitch and only one single port of type ‘dpdk’ attached to a
> bridge. The MTU size of the DPDK port is 9216 and the reserved
> HugePages for OVS are 512 x 2MB-HugePages, e.g. 1GB of total HugePage
> memory.
> 
> Setting everything up works fine, however after I reboot my box, the
> dpdk port is in error state and I can observe this line in the logs
> (full logs attached to the mail):
> 2019-11-02T14:46:16.914Z|00437|netdev_dpdk|ERR|Failed to create
> memory pool for netdev dpdk-p0, with MTU 9216 on socket 0: Invalid
> argument 2019-11-02T14:46:16.914Z|00438|dpif_netdev|ERR|Failed to set
> interface dpdk-p0 new configuration
> 
> I figured out that by restarting the openvswitch process, the issue
> with the port is resolved and it is back in a working state. However,
> as soon as I reboot the system a second time, the port comes up in
> error state again. Now, we have also observed a couple of other
> workarounds that I can’t really explain why they help:
> 
>   *   When there is also a VM deployed on the system that is using
> ports of type ‘dpdkvhostuserclient’, we never see any issues like
> that. (MTU size of the VM ports is 9216 by the way)
>   *   When we increase the HugePage memory for OVS to 2GB, we also
> don’t see any issues.
>   *   Lowering the MTU size of the ‘dpdk’ type port to 1500 also
> helps to prevent this issue.
> 
> Can anyone explain this?
> 
> We’re using the following versions:
> Openvswitch: 2.9.3
> DPDK: 17.11.5
> 
> Appreciate any help!
> Tobias



___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr

2019-11-04 Thread Ramana Reddy
Thanks, Flavio. I will check it out tomorrow and let you know how it goes.

Regards,
Ramana


On Mon, Nov 4, 2019 at 10:15 PM Flavio Leitner  wrote:

> On Mon, 4 Nov 2019 21:32:28 +0530
> Ramana Reddy  wrote:
>
> > Hi Favio Leitner,
> > Thank you very much for your reply. Here is the code snippet. But the
> > same code is working if I send the packet without ovs.
>
> Could you provide more details on the OvS environment and the test?
>
> The linux kernel propagates the header size dependencies when you stack
> the devices in net_device->hard_header_len, so in the case of vxlan dev
> it will be:
>
> needed_headroom = lowerdev->hard_header_len;
> needed_headroom += VXLAN_HEADROOM;
> dev->needed_headroom = needed_headroom;
>
> Sounds like that is helping when OvS is not being used.
>
> fbl
>
>
> > bool csum = true;
> > bool gso = true'
> >  struct virtio_net_hdr *vnet = buf;
> >if (csum) {
> > vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);
> > vnet->csum_start = ETH_HLEN + sizeof(*iph);
> > vnet->csum_offset = __builtin_offsetof(struct
> > tcphdr, check);
> > }
> >
> > if (gso) {
> > vnet->hdr_len = ETH_HLEN + sizeof(*iph) +
> > sizeof(*tcph);
> > vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> > vnet->gso_size = ETH_DATA_LEN - sizeof(struct
> > iphdr) -
> > sizeof(struct
> > tcphdr);
> > } else {
> > vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE;
> > }
> > Regards,
> > Ramana
> >
> >
> > On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner 
> > wrote:
> >
> > >
> > > Hi,
> > >
> > > What's the value you're passing on gso_size in struct
> > > virtio_net_hdr? You need to leave room for the encapsulation
> > > header, e.g.:
> > >
> > > gso_size = iface_mtu - virtio_net_hdr->hdr_len
> > >
> > > fbl
> > >
> > > On Mon, 4 Nov 2019 01:11:36 +0530
> > > Ramana Reddy  wrote:
> > >
> > > > Hi,
> > > > I am wondering if anyone can help me with this. I am having
> > > > trouble to send tso/gso packet
> > > > with af_packet socket with packet_vnet_hdr (through
> > > > virtio_net_hdr) over vxlan tunnel in OVS.
> > > >
> > > > What I observed that, the following function eventually hitting
> > > > and is returning false (net/core/skbuff.c), hence the packet is
> > > > dropping. static inline bool skb_gso_size_check(const struct
> > > > sk_buff *skb, unsigned int seg_len,
> > > >   unsigned int max_len) {
> > > > const struct skb_shared_info *shinfo = skb_shinfo(skb);
> > > > const struct sk_buff *iter;
> > > > if (shinfo->gso_size != GSO_BY_FRAGS)
> > > > return seg_len <= max_len;
> > > > ..
> > > > }
> > > > [  678.756673] ip_finish_output_gso:235 packet_length:2762 (here
> > > > packet_length = skb->len - skb_inner_network_offset(skb))
> > > > [  678.756678] ip_fragment:510 packet length:1500
> > > > [  678.756715] ip_fragment:510 packet length:1314
> > > > [  678.956889] skb_gso_size_check:4474 and seg_len:1550 and
> > > > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535
> > > >
> > > > Observation:
> > > > When we send the large packet ( example here is
> > > > packet_length:2762), its showing the seg_len(1550) >
> > > > max_len(1500). Hence return seg_len <= max_len statement
> > > > returning false. Because of this, ip_fragment calling
> > > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
> > > > rather the code reaching to ip_finish_output2(sk, skb)
> > > > function in net/ipv4/ip_output.c and is given below:
> > > >
> > > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff
> > > > *skb, unsigned int mtu)
> > > > {
> > > > netdev_features_t features;
> > > > struct sk_buff *segs;
> > > > int ret = 0;
> > > >
> > > > /* common case: seglen is <= mtu */
> > > > if (skb_gso_validate_mtu(skb, mtu))
> > > > return ip_finish_output2(sk, skb);
> > > >...
> > > >   err = ip_fragment(sk, segs, mtu, ip_finish_output2);
> > > >   ...
> > > >  }
> > > >
> > > > But when we send normal iperf traffic ( gso/tso  traffic) over
> > > > vxlan, the skb_gso_size_check returning a true value, and
> > > > ip_finish_output2 getting executed.
> > > > Here is the values of normal iperf traffic over vxlan.
> > > >
> > > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and
> > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > > > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and
> > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > > > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and
> > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > > > [ 

Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr

2019-11-04 Thread Flavio Leitner
On Mon, 4 Nov 2019 21:32:28 +0530
Ramana Reddy  wrote:

> Hi Favio Leitner,
> Thank you very much for your reply. Here is the code snippet. But the
> same code is working if I send the packet without ovs.

Could you provide more details on the OvS environment and the test?

The linux kernel propagates the header size dependencies when you stack
the devices in net_device->hard_header_len, so in the case of vxlan dev
it will be:

needed_headroom = lowerdev->hard_header_len;
needed_headroom += VXLAN_HEADROOM;
dev->needed_headroom = needed_headroom;

Sounds like that is helping when OvS is not being used.

fbl


> bool csum = true;
> bool gso = true'
>  struct virtio_net_hdr *vnet = buf;
>if (csum) {
> vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);
> vnet->csum_start = ETH_HLEN + sizeof(*iph);
> vnet->csum_offset = __builtin_offsetof(struct
> tcphdr, check);
> }
> 
> if (gso) {
> vnet->hdr_len = ETH_HLEN + sizeof(*iph) +
> sizeof(*tcph);
> vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> vnet->gso_size = ETH_DATA_LEN - sizeof(struct
> iphdr) -
> sizeof(struct
> tcphdr);
> } else {
> vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE;
> }
> Regards,
> Ramana
> 
> 
> On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner 
> wrote:
> 
> >
> > Hi,
> >
> > What's the value you're passing on gso_size in struct
> > virtio_net_hdr? You need to leave room for the encapsulation
> > header, e.g.:
> >
> > gso_size = iface_mtu - virtio_net_hdr->hdr_len
> >
> > fbl
> >
> > On Mon, 4 Nov 2019 01:11:36 +0530
> > Ramana Reddy  wrote:
> >  
> > > Hi,
> > > I am wondering if anyone can help me with this. I am having
> > > trouble to send tso/gso packet
> > > with af_packet socket with packet_vnet_hdr (through
> > > virtio_net_hdr) over vxlan tunnel in OVS.
> > >
> > > What I observed that, the following function eventually hitting
> > > and is returning false (net/core/skbuff.c), hence the packet is
> > > dropping. static inline bool skb_gso_size_check(const struct
> > > sk_buff *skb, unsigned int seg_len,
> > >   unsigned int max_len) {
> > > const struct skb_shared_info *shinfo = skb_shinfo(skb);
> > > const struct sk_buff *iter;
> > > if (shinfo->gso_size != GSO_BY_FRAGS)
> > > return seg_len <= max_len;
> > > ..
> > > }
> > > [  678.756673] ip_finish_output_gso:235 packet_length:2762 (here
> > > packet_length = skb->len - skb_inner_network_offset(skb))
> > > [  678.756678] ip_fragment:510 packet length:1500
> > > [  678.756715] ip_fragment:510 packet length:1314
> > > [  678.956889] skb_gso_size_check:4474 and seg_len:1550 and
> > > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535
> > >
> > > Observation:
> > > When we send the large packet ( example here is
> > > packet_length:2762), its showing the seg_len(1550) >
> > > max_len(1500). Hence return seg_len <= max_len statement
> > > returning false. Because of this, ip_fragment calling
> > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
> > > rather the code reaching to ip_finish_output2(sk, skb)
> > > function in net/ipv4/ip_output.c and is given below:
> > >
> > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff
> > > *skb, unsigned int mtu)
> > > {
> > > netdev_features_t features;
> > > struct sk_buff *segs;
> > > int ret = 0;
> > >
> > > /* common case: seglen is <= mtu */
> > > if (skb_gso_validate_mtu(skb, mtu))
> > > return ip_finish_output2(sk, skb);
> > >...
> > >   err = ip_fragment(sk, segs, mtu, ip_finish_output2);
> > >   ...
> > >  }
> > >
> > > But when we send normal iperf traffic ( gso/tso  traffic) over
> > > vxlan, the skb_gso_size_check returning a true value, and
> > > ip_finish_output2 getting executed.
> > > Here is the values of normal iperf traffic over vxlan.
> > >
> > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and
> > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and
> > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and
> > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > > [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and
> > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > > [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and
> > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > >
> > > Can someone help me to solve what is missing, and where should I
> > > modify the code in OVS/ or outside of ovs, so 

Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr

2019-11-04 Thread Ramana Reddy
Hi Favio Leitner,
Thank you very much for your reply. Here is the code snippet. But the same
code is working if I send the packet without ovs.
bool csum = true;
bool gso = true'
 struct virtio_net_hdr *vnet = buf;
   if (csum) {
vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);
vnet->csum_start = ETH_HLEN + sizeof(*iph);
vnet->csum_offset = __builtin_offsetof(struct
tcphdr, check);
}

if (gso) {
vnet->hdr_len = ETH_HLEN + sizeof(*iph) +
sizeof(*tcph);
vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
vnet->gso_size = ETH_DATA_LEN - sizeof(struct
iphdr) -
sizeof(struct
tcphdr);
} else {
vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE;
}
Regards,
Ramana


On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner  wrote:

>
> Hi,
>
> What's the value you're passing on gso_size in struct virtio_net_hdr?
> You need to leave room for the encapsulation header, e.g.:
>
> gso_size = iface_mtu - virtio_net_hdr->hdr_len
>
> fbl
>
> On Mon, 4 Nov 2019 01:11:36 +0530
> Ramana Reddy  wrote:
>
> > Hi,
> > I am wondering if anyone can help me with this. I am having trouble
> > to send tso/gso packet
> > with af_packet socket with packet_vnet_hdr (through virtio_net_hdr)
> > over vxlan tunnel in OVS.
> >
> > What I observed that, the following function eventually hitting and is
> > returning false (net/core/skbuff.c), hence the packet is dropping.
> > static inline bool skb_gso_size_check(const struct sk_buff *skb,
> >   unsigned int seg_len,
> >   unsigned int max_len) {
> > const struct skb_shared_info *shinfo = skb_shinfo(skb);
> > const struct sk_buff *iter;
> > if (shinfo->gso_size != GSO_BY_FRAGS)
> > return seg_len <= max_len;
> > ..
> > }
> > [  678.756673] ip_finish_output_gso:235 packet_length:2762 (here
> > packet_length = skb->len - skb_inner_network_offset(skb))
> > [  678.756678] ip_fragment:510 packet length:1500
> > [  678.756715] ip_fragment:510 packet length:1314
> > [  678.956889] skb_gso_size_check:4474 and seg_len:1550 and
> > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535
> >
> > Observation:
> > When we send the large packet ( example here is packet_length:2762),
> > its showing the seg_len(1550) > max_len(1500). Hence return seg_len
> > <= max_len statement returning false.
> > Because of this, ip_fragment calling icmp_send(skb, ICMP_DEST_UNREACH,
> > ICMP_FRAG_NEEDED, htonl(mtu)); rather the code reaching to
> > ip_finish_output2(sk, skb)
> > function in net/ipv4/ip_output.c and is given below:
> >
> > static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb,
> > unsigned int mtu)
> > {
> > netdev_features_t features;
> > struct sk_buff *segs;
> > int ret = 0;
> >
> > /* common case: seglen is <= mtu */
> > if (skb_gso_validate_mtu(skb, mtu))
> > return ip_finish_output2(sk, skb);
> >...
> >   err = ip_fragment(sk, segs, mtu, ip_finish_output2);
> >   ...
> >  }
> >
> > But when we send normal iperf traffic ( gso/tso  traffic) over vxlan,
> > the skb_gso_size_check returning a true value, and ip_finish_output2
> > getting executed.
> > Here is the values of normal iperf traffic over vxlan.
> >
> > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and
> > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and
> > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and
> > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and
> > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> > [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and
> > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> >
> > Can someone help me to solve what is missing, and where should I
> > modify the code in OVS/ or outside of ovs, so that it works as
> > expected.
> >
> > Thanks in advance.
> >
> > Some more info:
> > [root@xx ~]# uname -r
> > 3.10.0-1062.4.1.el7.x86_64
> > [root@xx ~]# cat /etc/redhat-release
> > Red Hat Enterprise Linux Server release 7.7 (Maipo)
> >
> > [root@xx]# ovs-vsctl --version
> > ovs-vsctl (Open vSwitch) 2.9.0
> > DB Schema 7.15.1
> >
> > And dump_stack output with af_packet:
> > [ 4833.637460][] dump_stack+0x19/0x1b
> > [ 4833.637474]  []
> > ip_fragment.constprop.55+0xc3/0x141 [ 4833.637481]
> > [] ip_finish_output+0x314/0x350 [ 4833.637484]
> > [] ip_output+0xb3/0x130 [ 4833.637490]
> > [] ? 

Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr

2019-11-04 Thread Flavio Leitner


Hi,

What's the value you're passing on gso_size in struct virtio_net_hdr?
You need to leave room for the encapsulation header, e.g.:

gso_size = iface_mtu - virtio_net_hdr->hdr_len

fbl

On Mon, 4 Nov 2019 01:11:36 +0530
Ramana Reddy  wrote:

> Hi,
> I am wondering if anyone can help me with this. I am having trouble
> to send tso/gso packet
> with af_packet socket with packet_vnet_hdr (through virtio_net_hdr)
> over vxlan tunnel in OVS.
> 
> What I observed that, the following function eventually hitting and is
> returning false (net/core/skbuff.c), hence the packet is dropping.
> static inline bool skb_gso_size_check(const struct sk_buff *skb,
>   unsigned int seg_len,
>   unsigned int max_len) {
> const struct skb_shared_info *shinfo = skb_shinfo(skb);
> const struct sk_buff *iter;
> if (shinfo->gso_size != GSO_BY_FRAGS)
> return seg_len <= max_len;
> ..
> }
> [  678.756673] ip_finish_output_gso:235 packet_length:2762 (here
> packet_length = skb->len - skb_inner_network_offset(skb))
> [  678.756678] ip_fragment:510 packet length:1500
> [  678.756715] ip_fragment:510 packet length:1314
> [  678.956889] skb_gso_size_check:4474 and seg_len:1550 and
> max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535
> 
> Observation:
> When we send the large packet ( example here is packet_length:2762),
> its showing the seg_len(1550) > max_len(1500). Hence return seg_len
> <= max_len statement returning false.
> Because of this, ip_fragment calling icmp_send(skb, ICMP_DEST_UNREACH,
> ICMP_FRAG_NEEDED, htonl(mtu)); rather the code reaching to
> ip_finish_output2(sk, skb)
> function in net/ipv4/ip_output.c and is given below:
> 
> static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb,
> unsigned int mtu)
> {
> netdev_features_t features;
> struct sk_buff *segs;
> int ret = 0;
> 
> /* common case: seglen is <= mtu */
> if (skb_gso_validate_mtu(skb, mtu))
> return ip_finish_output2(sk, skb);
>...
>   err = ip_fragment(sk, segs, mtu, ip_finish_output2);
>   ...
>  }
> 
> But when we send normal iperf traffic ( gso/tso  traffic) over vxlan,
> the skb_gso_size_check returning a true value, and ip_finish_output2
> getting executed.
> Here is the values of normal iperf traffic over vxlan.
> 
> [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and
> max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and
> max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and
> max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and
> max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and
> max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
> 
> Can someone help me to solve what is missing, and where should I
> modify the code in OVS/ or outside of ovs, so that it works as
> expected.
> 
> Thanks in advance.
> 
> Some more info:
> [root@xx ~]# uname -r
> 3.10.0-1062.4.1.el7.x86_64
> [root@xx ~]# cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.7 (Maipo)
> 
> [root@xx]# ovs-vsctl --version
> ovs-vsctl (Open vSwitch) 2.9.0
> DB Schema 7.15.1
> 
> And dump_stack output with af_packet:
> [ 4833.637460][] dump_stack+0x19/0x1b
> [ 4833.637474]  []
> ip_fragment.constprop.55+0xc3/0x141 [ 4833.637481]
> [] ip_finish_output+0x314/0x350 [ 4833.637484]
> [] ip_output+0xb3/0x130 [ 4833.637490]
> [] ? ip_do_fragment+0x910/0x910 [ 4833.637493]
> [] ip_local_out_sk+0xf9/0x180 [ 4833.637497]
> [] iptunnel_xmit+0x18c/0x220 [ 4833.637505]
> [] udp_tunnel_xmit_skb+0x117/0x130 [udp_tunnel]
> [ 4833.637538]  [] vxlan_xmit_one+0xb6a/0xb70
> [vxlan] [ 4833.637545]  [] ?
> vprintk_default+0x29/0x40 [ 4833.637551]  []
> vxlan_xmit+0xc9e/0xef0 [vxlan] [ 4833.637555]  [] ?
> kfree_skbmem+0x37/0x90 [ 4833.637559]  [] ?
> consume_skb+0x34/0x90 [ 4833.637564]  [] ?
> packet_rcv+0x4c/0x3e0 [ 4833.637570]  []
> dev_hard_start_xmit+0x246/0x3b0 [ 4833.637574]  []
> __dev_queue_xmit+0x519/0x650 [ 4833.637580]  [] ?
> try_to_wake_up+0x190/0x390 [ 4833.637585]  []
> dev_queue_xmit+0x10/0x20 [ 4833.637592]  []
> ovs_vport_send+0xa6/0x180 [openvswitch] [ 4833.637599]
> [] do_output+0x4e/0xd0 [openvswitch] [ 4833.637604]
>  [] do_execute_actions+0xa29/0xa40 [openvswitch]
> [ 4833.637610]  [] ? __wake_up_common+0x82/0x120
> [ 4833.637615]  [] ovs_execute_actions+0x4c/0x140
> [openvswitch]
> [ 4833.637621]  [] ovs_dp_process_packet+0x84/0x120
> [openvswitch]
> [ 4833.637627]  [] ? ovs_ct_update_key+0xc4/0x150
> [openvswitch]
> [ 4833.637633]  [] ovs_vport_receive+0x73/0xd0
> [openvswitch]
> [ 4833.637638]  

Re: [ovs-discuss] OVS DPDK: Failed to create memory pool for netdev

2019-11-04 Thread Flavio Leitner

It would be nice if you share the DPDK options used in OvS.

On Sat, 2 Nov 2019 15:43:18 +
"Tobias Hofmann \(tohofman\) via discuss" 
wrote:

> Hello community,
> 
> My team and I observe a strange behavior on our system with the
> creation of dpdk ports in OVS. We have a CentOS 7 system with
> OpenvSwitch and only one single port of type ‘dpdk’ attached to a
> bridge. The MTU size of the DPDK port is 9216 and the reserved
> HugePages for OVS are 512 x 2MB-HugePages, e.g. 1GB of total HugePage
> memory.
> 
> Setting everything up works fine, however after I reboot my box, the
> dpdk port is in error state and I can observe this line in the logs
> (full logs attached to the mail):
> 2019-11-02T14:46:16.914Z|00437|netdev_dpdk|ERR|Failed to create
> memory pool for netdev dpdk-p0, with MTU 9216 on socket 0: Invalid
> argument 2019-11-02T14:46:16.914Z|00438|dpif_netdev|ERR|Failed to set
> interface dpdk-p0 new configuration
> 
> I figured out that by restarting the openvswitch process, the issue
> with the port is resolved and it is back in a working state. However,
> as soon as I reboot the system a second time, the port comes up in
> error state again. Now, we have also observed a couple of other
> workarounds that I can’t really explain why they help:
> 
>   *   When there is also a VM deployed on the system that is using
> ports of type ‘dpdkvhostuserclient’, we never see any issues like
> that. (MTU size of the VM ports is 9216 by the way)
>   *   When we increase the HugePage memory for OVS to 2GB, we also
> don’t see any issues.
>   *   Lowering the MTU size of the ‘dpdk’ type port to 1500 also
> helps to prevent this issue.
> 
> Can anyone explain this?
> 
> We’re using the following versions:
> Openvswitch: 2.9.3
> DPDK: 17.11.5
> 
> Appreciate any help!
> Tobias

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS deleting flows from the datapath on exit

2019-11-04 Thread Flavio Leitner
On Fri, 1 Nov 2019 13:35:07 -0700
Ben Pfaff  wrote:

> OVS currently can gracefully exit in two ways: either with or without
> deleting the datapath.  But, either way, it deletes all of the flows
> from the datapath before it exits.  That is due to commit e96a5c24e853
> ("upcall: Remove datapath flows when setting n-threads."), which was
> first released in OVS 2.1 back in 2014.
> 
> This isn't usually a big deal.  However, some controller folks I'm
> talking to are concerned about upgrade.  If the datapath flows
> persisted after OVS exits, then existing network connections (and
> perhaps some that are "similar" to them because they match the same
> megaflows) could carry on while the upgrade was in progress.  
> 
> I am surprised that I have not heard complaints about this in the 5
> years that the behavior has been this way.  Does anyone have any
> stories to report about it now that I bring it up?  Contrariwise, if
> we changed OVS so that it did not delete datapath flows on exit, can
> anyone suggest what problems that might cause?

Well, I heard complains about updating OvS package causing a long
downtime in OSP environments mainly because all the flows needed to be
rebuilt at OSP side which was a slow process.

When a service is restarted, it is expected to come up with a "clean
and fresh state" and so far flows were seen as "temporary" data. In
order to provide the option to restore the flows, the following commit
was introduced to create a "reload" service:
  commit ea36b04688f37cf45b7c2304ce31f0d29f212d54
  Author: Timothy Redaelli 
  Date:   Fri Nov 3 21:39:17 2017 +0100

  rhel: Add support for "systemctl reload openvswitch"

Now the openvswitch service could be restarted with flows persisting.

There was also an investigation to preserve the kernel datapath cache
during the reload to be less disruptive as possible. However after the
above commit I never heard about package update issues again, so we
dropped the kernel datapath persistent effort.

fbl
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] the network performence is not normal when use openvswitch.ko make from ovs tree

2019-11-04 Thread shuangyang qian
Hi:
I make rpm packages for ovs and ovn with this document:
http://docs.openvswitch.org/en/latest/intro/install/fedora/ . For use the
kernel module in ovs tree, i configure with the command: ./configure
--with-linux=/lib/modules/$(uname -r)/build .
Then install the rpm packages.
when i finished, i check the openvswitch.ko is like:
# lsmod |  grep openvswitch
openvswitch   291276  0
tunnel6 3115  1 openvswitch
nf_defrag_ipv6 25957  2 nf_conntrack_ipv6,openvswitch
nf_nat_ipv6 6459  2 openvswitch,ip6table_nat
nf_nat_ipv4 6187  2 openvswitch,iptable_nat
nf_nat 18080  5
xt_nat,openvswitch,nf_nat_ipv6,nf_nat_masquerade_ipv4,nf_nat_ipv4
nf_conntrack  102766  10
ip_vs,nf_conntrack_ipv6,openvswitch,nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_ipv6,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_nat
libcrc32c   1388  3 ip_vs,openvswitch,xfs
ipv6  400397  92
ip_vs,nf_conntrack_ipv6,openvswitch,nf_defrag_ipv6,nf_nat_ipv6,bridge
# modinfo openvswitch
filename:
/lib/modules/4.9.18-19080201/extra/openvswitch/openvswitch.ko
alias:  net-pf-16-proto-16-family-ovs_ct_limit
alias:  net-pf-16-proto-16-family-ovs_meter
alias:  net-pf-16-proto-16-family-ovs_packet
alias:  net-pf-16-proto-16-family-ovs_flow
alias:  net-pf-16-proto-16-family-ovs_vport
alias:  net-pf-16-proto-16-family-ovs_datapath
version:2.11.2
license:GPL
description:Open vSwitch switching datapath
srcversion: 9DDA327F9DD46B9813628A4
depends:
 
nf_conntrack,tunnel6,ipv6,nf_nat,nf_defrag_ipv6,libcrc32c,nf_nat_ipv6,nf_nat_ipv4
vermagic:   4.9.18-19080201 SMP mod_unload modversions
parm:   udp_port:Destination UDP port (ushort)
# rpm -qf /lib/modules/4.9.18-19080201/extra/openvswitch/openvswitch.ko
openvswitch-kmod-2.11.2-1.el7.x86_64

Then i start to build my network structure. I have two node,and network
namespace vm1 on node1, network namespace vm2 on node2. vm1's veth pair
veth-vm1 is on node1's br-int. vm2's veth pair veth-vm2 is on node2's
br-int. In logical layer, there is one logical switch test-subnet and two
logical switch port node1 and node2 on it. like this:
# ovn-nbctl show
switch 70585c0e-3cd9-459e-9448-3c13f3c0bfa3 (test-subnet)
port node2
addresses: ["00:00:00:00:00:02 192.168.100.20"]
port node1
addresses: ["00:00:00:00:00:01 192.168.100.10"]
on node1:
# ovs-vsctl show
5180f74a-1379-49af-b265-4403bd0d82d8
Bridge br-int
fail_mode: secure
Port "ovn-431b9e-0"
Interface "ovn-431b9e-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.18.124.2"}
Port br-int
Interface br-int
type: internal
Port "veth-vm1"
Interface "veth-vm1"
ovs_version: "2.11.2"
# ip netns exec vm1 ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
14: ovs-gretap0@NONE:  mtu 1462 qdisc noop state DOWN
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
15: erspan0@NONE:  mtu 1450 qdisc noop state DOWN
group default qlen 1000
link/ether 22:02:1b:08:ec:53 brd ff:ff:ff:ff:ff:ff
16: ovs-ip6gre0@NONE:  mtu 1448 qdisc noop state DOWN group default
qlen 1
link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
17: ovs-ip6tnl0@NONE:  mtu 1452 qdisc noop state DOWN group default
qlen 1
link/tunnel6 :: brd ::
18: vm1-eth0@if17:  mtu 1400 qdisc noqueue
state UP group default qlen 1000
link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.100.10/24 scope global vm1-eth0
   valid_lft forever preferred_lft forever
inet6 fe80::200:ff:fe00:1/64 scope link
   valid_lft forever preferred_lft forever


on node2:# ovs-vsctl show
011332d0-78bc-47f7-be3c-fab0beb08e28
Bridge br-int
fail_mode: secure
Port br-int
Interface br-int
type: internal
Port "ovn-c655f8-0"
Interface "ovn-c655f8-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.18.124.1"}
Port "veth-vm2"
Interface "veth-vm2"
ovs_version: "2.11.2"
#ip netns exec vm2 ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
10: ovs-gretap0@NONE:  mtu 1462 qdisc noop state DOWN
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
11: erspan0@NONE:  mtu