[ovs-discuss] Re: Re:Re: [HELP] Question about icmp pkt marked Invalid by userspace conntrack
Hi Darrell: The meter rate limit is set as 1Gbps, but the actual rate is around 500Mbps.. I have read the meter patch, but this patch is to prevent delta_t changed to 0. But in my case, the delta_t is around 35500ms. For my case, the meter action is on openflow table 46, and the ct action is on table 44, the output action is on table 65, so I guess the order is right? Thanks Timo -- :Darrell Ball :2019年11月5日(星期二) 06:56 :txfh2007 :Ben Pfaff ; ovs-discuss :Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by userspace conntrack Hi Timo On Sun, Nov 3, 2019 at 5:12 PM txfh2007 wrote: Hi Darrell: Sorry for my late reply. Yes, the two VMs under test are on same compute node , and pkts rx/tx via vhost user type port. Got it Firstly if I don't configure meter table, then Iperf TCP bandwidth result From VM1 to VM2 is around 5Gbps, then I set the meter entry and constraint the rate, and the deviation is larger than I throught. IIUC, pre-meter, you get 5 Gbps, then post-meter 0.5 Gpbs, which is less than you expected ? What did you expect the metered rate to be ? Note Ben pointed you to a meter related bug fix on the alias b4. I guess the recalculation of l4 checksum during conntrack would impact the actual rate? are you applying the meter rule at end of the complete pipeline ? Thank you Timo txfh2007 Ben Pfaff ; ovs-discuss Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by userspace conntrack Hi Timo I read thru this thread to get more context on what you are doing; you have a base OVS-DPDK use case and are measuring VM to VM performance across 2 compute nodes. You are probably using vhost-user-client ports ? Pls correct me if I am wrong. In this case, "per direction" you have one rx virtual interface to handle in OVS; there will be a tradeoff b/w checksum validation security and performance. JTBC, in terms of your measurements, how did you arrive at the 5Gbps - instrumented code or otherwise ?. (I can verify that later when I have a setup). Darrell On Thu, Oct 31, 2019 at 9:23 AM Darrell Ball wrote: On Thu, Oct 31, 2019 at 3:04 AM txfh2007 via discuss wrote: Hi Ben && Darrell: This patch works, but after merging this patch I have found the iperf throughout decrease from 5Gbps+ to 500Mbps. what is the 5Gbps number ? Is that the number with marking all packets as invalid in initial sanity checks ? Typically one wants to offload checksum checks. The code checks whether that has been done and skips doing it in software; can you verify that you have the capability and are using it ? Skipping checksum checks reduces security, of course, but it can be added if there is a common case of not being able to offload checksumming. I guess maybe we should add a switch to turn off layer4 checksum validation when doing userspace conntrack ? I have found for kernel conntrack, there is a related button named "nf_conntrack_checksum" . Any advice? Thank you ! -- :Ben Pfaff :ovs-discuss :Re:Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by userspace conntrack Hi Ben && Darrell: Thanks, this patch works! Now the issue seems fixed Timo Re: Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by userspace conntrack I see. It sounds like Darrell pointed out the solution, but please let me know if it did not help. On Fri, Oct 11, 2019 at 08:57:58AM +0800, txfh2007 wrote: > Hi Ben: > > I just found the GCC_UNALIGNED_ACCESSORS error during gdb trace and not > sure this is a misaligned error or others. What I can confirm is during > "extract_l4" of this icmp reply packet, when we do "check_l4_icmp", the > unaligned error emits and the "extract_l4" returned false. So this packet be > marked as ct_state=invalid. > > Thank you for your help. > > Timo > > Topic:Re: [ovs-discuss] [HELP] Question about icmp pkt marked Invalid by > userspace conntrack > > > It's very surprising. > > Are you using a RISC architecture that insists on aligned accesses? On > the other hand, if you are using x86-64 or some other architecture that > ordinarily does not care, are you sure that this is about a misaligned > access (it is more likely to simply be a bad pointer)? > > On Thu, Oct 10, 2019 at 10:50:33PM +0800, txfh2007 via discuss wrote: > > > > Hi all: > > I was using OVS-DPDK(version 2.10-1), and I have found pinging between > > two VMs on different compute nodes failed. I have checked my env and found > > there is one node's NIC cannot strip CRC of a frame, the other node's NIC > > is normal(I mean it can strip CRC ). And the reason of ping fail is the > > icmp reply pkt (from node whose NIC cannot strip CRC) is marked as invalid > > . So the icmp request From Node A is 64 bytes, but the icmp reply From
Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by userspace conntrack
Hi Timo On Sun, Nov 3, 2019 at 5:12 PM txfh2007 wrote: > Hi Darrell: > Sorry for my late reply. Yes, the two VMs under test are on same > compute node , and pkts rx/tx via vhost user type port. Got it > Firstly if I don't configure meter table, then Iperf TCP bandwidth result > From VM1 to VM2 is around 5Gbps, then I set the meter entry and constraint > the rate, and the deviation is larger than I throught. > IIUC, pre-meter, you get 5 Gbps, then post-meter 0.5 Gpbs, which is less than you expected ? What did you expect the metered rate to be ? Note Ben pointed you to a meter related bug fix on the alias b4. > I guess the recalculation of l4 checksum during conntrack would impact > the actual rate? > are you applying the meter rule at end of the complete pipeline ? > > Thank you > Timo > > > > > txfh2007 > Ben Pfaff ; ovs-discuss > Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by > userspace conntrack > > > Hi Timo > > > I read thru this thread to get more context on what you are doing; you > have a base OVS-DPDK > use case and are measuring VM to VM performance across 2 compute nodes. > You are probably using > vhost-user-client ports ? Pls correct me if I am wrong. > In this case, "per direction" you have one rx virtual interface to handle > in OVS; there will be a tradeoff b/w > checksum validation security and performance. > JTBC, in terms of your measurements, how did you arrive at the 5Gbps - > instrumented code or otherwise ?. > (I can verify that later when I have a setup). > > > Darrell > > > > > > > > > > > On Thu, Oct 31, 2019 at 9:23 AM Darrell Ball wrote: > > > > > On Thu, Oct 31, 2019 at 3:04 AM txfh2007 via discuss < > ovs-discuss@openvswitch.org> wrote: > > Hi Ben && Darrell: > This patch works, but after merging this patch I have found the iperf > throughout decrease from 5Gbps+ to 500Mbps. > > what is the 5Gbps number ? Is that the number with marking all packets as > invalid in initial sanity checks ? > > > Typically one wants to offload checksum checks. The code checks whether > that has been done and skips > doing it in software; can you verify that you have the capability and are > using it ? > > > Skipping checksum checks reduces security, of course, but it can be added > if there is a common case of > not being able to offload checksumming. > > > > I guess maybe we should add a switch to turn off layer4 checksum > validation when doing userspace conntrack ? I have found for kernel > conntrack, there is a related button named "nf_conntrack_checksum" . > > Any advice? > > Thank you ! > > -- > > :Ben Pfaff > :ovs-discuss > :Re:Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by > userspace conntrack > > > Hi Ben && Darrell: > Thanks, this patch works! Now the issue seems fixed > > Timo > > > Re: Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by > userspace conntrack > > > I see. > > It sounds like Darrell pointed out the solution, but please let me know > if it did not help. > > On Fri, Oct 11, 2019 at 08:57:58AM +0800, txfh2007 wrote: > > Hi Ben: > > > > I just found the GCC_UNALIGNED_ACCESSORS error during gdb trace and > not sure this is a misaligned error or others. What I can confirm is > during "extract_l4" of this icmp reply packet, when we do "check_l4_icmp", > the unaligned error emits and the "extract_l4" returned false. So this > packet be marked as ct_state=invalid. > > > > Thank you for your help. > > > > Timo > > > > Topic:Re: [ovs-discuss] [HELP] Question about icmp pkt marked Invalid by > userspace conntrack > > > > > > It's very surprising. > > > > Are you using a RISC architecture that insists on aligned accesses? On > > the other hand, if you are using x86-64 or some other architecture that > > ordinarily does not care, are you sure that this is about a misaligned > > access (it is more likely to simply be a bad pointer)? > > > > On Thu, Oct 10, 2019 at 10:50:33PM +0800, txfh2007 via discuss wrote: > > > > > > Hi all: > > > I was using OVS-DPDK(version 2.10-1), and I have found pinging > between two VMs on different compute nodes failed. I have checked my env > and found there is one node's NIC cannot strip CRC of a frame, the other > node's NIC is normal(I mean it can strip CRC ). And the reason of ping fail > is the icmp reply pkt (from node whose NIC cannot strip CRC) is marked as > invalid . So the icmp request From Node A is 64 bytes, but the icmp reply > From Node B is 68 bytes(with 4 bytes CRC). And when doing "check_l4_icmp", > when we call csum task(in lib/csum.c). Gcc emits unaligned accessor error. > The backtrace is as below: > > > > > > I just want to confirm if this phenomenon is reasonable? > > > > > > Many thanks > > > > > > Timo > > > > > > > > > get_unaligned_be16 (p=0x7f2ad0b1ed5c) at lib/unaligned.h:89 > > > 89 GCC_UNALIGNED_ACCESSORS(ovs_be16, be16); > > > (gdb) bt > > > #0
Re: [ovs-discuss] OVS DPDK: Failed to create memory pool for netdev
Hi Flavio, thanks for reaching out. The DPDK options used in OvS are: other_config:pmd-cpu-mask=0x202 other_config:dpdk-socket-mem=1024 other_config:dpdk-init=true For the dpdk port, we set: type=dpdk options:dpdk-devargs=:08:0b.2 external_ids:unused-drv=i40evf mtu_request=9216 Please let me know if this is what you asked for. Thanks Tobias On 04.11.19, 15:50, "Flavio Leitner" wrote: It would be nice if you share the DPDK options used in OvS. On Sat, 2 Nov 2019 15:43:18 + "Tobias Hofmann \(tohofman\) via discuss" wrote: > Hello community, > > My team and I observe a strange behavior on our system with the > creation of dpdk ports in OVS. We have a CentOS 7 system with > OpenvSwitch and only one single port of type ‘dpdk’ attached to a > bridge. The MTU size of the DPDK port is 9216 and the reserved > HugePages for OVS are 512 x 2MB-HugePages, e.g. 1GB of total HugePage > memory. > > Setting everything up works fine, however after I reboot my box, the > dpdk port is in error state and I can observe this line in the logs > (full logs attached to the mail): > 2019-11-02T14:46:16.914Z|00437|netdev_dpdk|ERR|Failed to create > memory pool for netdev dpdk-p0, with MTU 9216 on socket 0: Invalid > argument 2019-11-02T14:46:16.914Z|00438|dpif_netdev|ERR|Failed to set > interface dpdk-p0 new configuration > > I figured out that by restarting the openvswitch process, the issue > with the port is resolved and it is back in a working state. However, > as soon as I reboot the system a second time, the port comes up in > error state again. Now, we have also observed a couple of other > workarounds that I can’t really explain why they help: > > * When there is also a VM deployed on the system that is using > ports of type ‘dpdkvhostuserclient’, we never see any issues like > that. (MTU size of the VM ports is 9216 by the way) > * When we increase the HugePage memory for OVS to 2GB, we also > don’t see any issues. > * Lowering the MTU size of the ‘dpdk’ type port to 1500 also > helps to prevent this issue. > > Can anyone explain this? > > We’re using the following versions: > Openvswitch: 2.9.3 > DPDK: 17.11.5 > > Appreciate any help! > Tobias ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr
Thanks, Flavio. I will check it out tomorrow and let you know how it goes. Regards, Ramana On Mon, Nov 4, 2019 at 10:15 PM Flavio Leitner wrote: > On Mon, 4 Nov 2019 21:32:28 +0530 > Ramana Reddy wrote: > > > Hi Favio Leitner, > > Thank you very much for your reply. Here is the code snippet. But the > > same code is working if I send the packet without ovs. > > Could you provide more details on the OvS environment and the test? > > The linux kernel propagates the header size dependencies when you stack > the devices in net_device->hard_header_len, so in the case of vxlan dev > it will be: > > needed_headroom = lowerdev->hard_header_len; > needed_headroom += VXLAN_HEADROOM; > dev->needed_headroom = needed_headroom; > > Sounds like that is helping when OvS is not being used. > > fbl > > > > bool csum = true; > > bool gso = true' > > struct virtio_net_hdr *vnet = buf; > >if (csum) { > > vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM); > > vnet->csum_start = ETH_HLEN + sizeof(*iph); > > vnet->csum_offset = __builtin_offsetof(struct > > tcphdr, check); > > } > > > > if (gso) { > > vnet->hdr_len = ETH_HLEN + sizeof(*iph) + > > sizeof(*tcph); > > vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; > > vnet->gso_size = ETH_DATA_LEN - sizeof(struct > > iphdr) - > > sizeof(struct > > tcphdr); > > } else { > > vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE; > > } > > Regards, > > Ramana > > > > > > On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner > > wrote: > > > > > > > > Hi, > > > > > > What's the value you're passing on gso_size in struct > > > virtio_net_hdr? You need to leave room for the encapsulation > > > header, e.g.: > > > > > > gso_size = iface_mtu - virtio_net_hdr->hdr_len > > > > > > fbl > > > > > > On Mon, 4 Nov 2019 01:11:36 +0530 > > > Ramana Reddy wrote: > > > > > > > Hi, > > > > I am wondering if anyone can help me with this. I am having > > > > trouble to send tso/gso packet > > > > with af_packet socket with packet_vnet_hdr (through > > > > virtio_net_hdr) over vxlan tunnel in OVS. > > > > > > > > What I observed that, the following function eventually hitting > > > > and is returning false (net/core/skbuff.c), hence the packet is > > > > dropping. static inline bool skb_gso_size_check(const struct > > > > sk_buff *skb, unsigned int seg_len, > > > > unsigned int max_len) { > > > > const struct skb_shared_info *shinfo = skb_shinfo(skb); > > > > const struct sk_buff *iter; > > > > if (shinfo->gso_size != GSO_BY_FRAGS) > > > > return seg_len <= max_len; > > > > .. > > > > } > > > > [ 678.756673] ip_finish_output_gso:235 packet_length:2762 (here > > > > packet_length = skb->len - skb_inner_network_offset(skb)) > > > > [ 678.756678] ip_fragment:510 packet length:1500 > > > > [ 678.756715] ip_fragment:510 packet length:1314 > > > > [ 678.956889] skb_gso_size_check:4474 and seg_len:1550 and > > > > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535 > > > > > > > > Observation: > > > > When we send the large packet ( example here is > > > > packet_length:2762), its showing the seg_len(1550) > > > > > max_len(1500). Hence return seg_len <= max_len statement > > > > returning false. Because of this, ip_fragment calling > > > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); > > > > rather the code reaching to ip_finish_output2(sk, skb) > > > > function in net/ipv4/ip_output.c and is given below: > > > > > > > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff > > > > *skb, unsigned int mtu) > > > > { > > > > netdev_features_t features; > > > > struct sk_buff *segs; > > > > int ret = 0; > > > > > > > > /* common case: seglen is <= mtu */ > > > > if (skb_gso_validate_mtu(skb, mtu)) > > > > return ip_finish_output2(sk, skb); > > > >... > > > > err = ip_fragment(sk, segs, mtu, ip_finish_output2); > > > > ... > > > > } > > > > > > > > But when we send normal iperf traffic ( gso/tso traffic) over > > > > vxlan, the skb_gso_size_check returning a true value, and > > > > ip_finish_output2 getting executed. > > > > Here is the values of normal iperf traffic over vxlan. > > > > > > > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and > > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and > > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and > > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > > [
Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr
On Mon, 4 Nov 2019 21:32:28 +0530 Ramana Reddy wrote: > Hi Favio Leitner, > Thank you very much for your reply. Here is the code snippet. But the > same code is working if I send the packet without ovs. Could you provide more details on the OvS environment and the test? The linux kernel propagates the header size dependencies when you stack the devices in net_device->hard_header_len, so in the case of vxlan dev it will be: needed_headroom = lowerdev->hard_header_len; needed_headroom += VXLAN_HEADROOM; dev->needed_headroom = needed_headroom; Sounds like that is helping when OvS is not being used. fbl > bool csum = true; > bool gso = true' > struct virtio_net_hdr *vnet = buf; >if (csum) { > vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM); > vnet->csum_start = ETH_HLEN + sizeof(*iph); > vnet->csum_offset = __builtin_offsetof(struct > tcphdr, check); > } > > if (gso) { > vnet->hdr_len = ETH_HLEN + sizeof(*iph) + > sizeof(*tcph); > vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; > vnet->gso_size = ETH_DATA_LEN - sizeof(struct > iphdr) - > sizeof(struct > tcphdr); > } else { > vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE; > } > Regards, > Ramana > > > On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner > wrote: > > > > > Hi, > > > > What's the value you're passing on gso_size in struct > > virtio_net_hdr? You need to leave room for the encapsulation > > header, e.g.: > > > > gso_size = iface_mtu - virtio_net_hdr->hdr_len > > > > fbl > > > > On Mon, 4 Nov 2019 01:11:36 +0530 > > Ramana Reddy wrote: > > > > > Hi, > > > I am wondering if anyone can help me with this. I am having > > > trouble to send tso/gso packet > > > with af_packet socket with packet_vnet_hdr (through > > > virtio_net_hdr) over vxlan tunnel in OVS. > > > > > > What I observed that, the following function eventually hitting > > > and is returning false (net/core/skbuff.c), hence the packet is > > > dropping. static inline bool skb_gso_size_check(const struct > > > sk_buff *skb, unsigned int seg_len, > > > unsigned int max_len) { > > > const struct skb_shared_info *shinfo = skb_shinfo(skb); > > > const struct sk_buff *iter; > > > if (shinfo->gso_size != GSO_BY_FRAGS) > > > return seg_len <= max_len; > > > .. > > > } > > > [ 678.756673] ip_finish_output_gso:235 packet_length:2762 (here > > > packet_length = skb->len - skb_inner_network_offset(skb)) > > > [ 678.756678] ip_fragment:510 packet length:1500 > > > [ 678.756715] ip_fragment:510 packet length:1314 > > > [ 678.956889] skb_gso_size_check:4474 and seg_len:1550 and > > > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535 > > > > > > Observation: > > > When we send the large packet ( example here is > > > packet_length:2762), its showing the seg_len(1550) > > > > max_len(1500). Hence return seg_len <= max_len statement > > > returning false. Because of this, ip_fragment calling > > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); > > > rather the code reaching to ip_finish_output2(sk, skb) > > > function in net/ipv4/ip_output.c and is given below: > > > > > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff > > > *skb, unsigned int mtu) > > > { > > > netdev_features_t features; > > > struct sk_buff *segs; > > > int ret = 0; > > > > > > /* common case: seglen is <= mtu */ > > > if (skb_gso_validate_mtu(skb, mtu)) > > > return ip_finish_output2(sk, skb); > > >... > > > err = ip_fragment(sk, segs, mtu, ip_finish_output2); > > > ... > > > } > > > > > > But when we send normal iperf traffic ( gso/tso traffic) over > > > vxlan, the skb_gso_size_check returning a true value, and > > > ip_finish_output2 getting executed. > > > Here is the values of normal iperf traffic over vxlan. > > > > > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > > > > Can someone help me to solve what is missing, and where should I > > > modify the code in OVS/ or outside of ovs, so
Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr
Hi Favio Leitner, Thank you very much for your reply. Here is the code snippet. But the same code is working if I send the packet without ovs. bool csum = true; bool gso = true' struct virtio_net_hdr *vnet = buf; if (csum) { vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM); vnet->csum_start = ETH_HLEN + sizeof(*iph); vnet->csum_offset = __builtin_offsetof(struct tcphdr, check); } if (gso) { vnet->hdr_len = ETH_HLEN + sizeof(*iph) + sizeof(*tcph); vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; vnet->gso_size = ETH_DATA_LEN - sizeof(struct iphdr) - sizeof(struct tcphdr); } else { vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE; } Regards, Ramana On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner wrote: > > Hi, > > What's the value you're passing on gso_size in struct virtio_net_hdr? > You need to leave room for the encapsulation header, e.g.: > > gso_size = iface_mtu - virtio_net_hdr->hdr_len > > fbl > > On Mon, 4 Nov 2019 01:11:36 +0530 > Ramana Reddy wrote: > > > Hi, > > I am wondering if anyone can help me with this. I am having trouble > > to send tso/gso packet > > with af_packet socket with packet_vnet_hdr (through virtio_net_hdr) > > over vxlan tunnel in OVS. > > > > What I observed that, the following function eventually hitting and is > > returning false (net/core/skbuff.c), hence the packet is dropping. > > static inline bool skb_gso_size_check(const struct sk_buff *skb, > > unsigned int seg_len, > > unsigned int max_len) { > > const struct skb_shared_info *shinfo = skb_shinfo(skb); > > const struct sk_buff *iter; > > if (shinfo->gso_size != GSO_BY_FRAGS) > > return seg_len <= max_len; > > .. > > } > > [ 678.756673] ip_finish_output_gso:235 packet_length:2762 (here > > packet_length = skb->len - skb_inner_network_offset(skb)) > > [ 678.756678] ip_fragment:510 packet length:1500 > > [ 678.756715] ip_fragment:510 packet length:1314 > > [ 678.956889] skb_gso_size_check:4474 and seg_len:1550 and > > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535 > > > > Observation: > > When we send the large packet ( example here is packet_length:2762), > > its showing the seg_len(1550) > max_len(1500). Hence return seg_len > > <= max_len statement returning false. > > Because of this, ip_fragment calling icmp_send(skb, ICMP_DEST_UNREACH, > > ICMP_FRAG_NEEDED, htonl(mtu)); rather the code reaching to > > ip_finish_output2(sk, skb) > > function in net/ipv4/ip_output.c and is given below: > > > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb, > > unsigned int mtu) > > { > > netdev_features_t features; > > struct sk_buff *segs; > > int ret = 0; > > > > /* common case: seglen is <= mtu */ > > if (skb_gso_validate_mtu(skb, mtu)) > > return ip_finish_output2(sk, skb); > >... > > err = ip_fragment(sk, segs, mtu, ip_finish_output2); > > ... > > } > > > > But when we send normal iperf traffic ( gso/tso traffic) over vxlan, > > the skb_gso_size_check returning a true value, and ip_finish_output2 > > getting executed. > > Here is the values of normal iperf traffic over vxlan. > > > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > > > Can someone help me to solve what is missing, and where should I > > modify the code in OVS/ or outside of ovs, so that it works as > > expected. > > > > Thanks in advance. > > > > Some more info: > > [root@xx ~]# uname -r > > 3.10.0-1062.4.1.el7.x86_64 > > [root@xx ~]# cat /etc/redhat-release > > Red Hat Enterprise Linux Server release 7.7 (Maipo) > > > > [root@xx]# ovs-vsctl --version > > ovs-vsctl (Open vSwitch) 2.9.0 > > DB Schema 7.15.1 > > > > And dump_stack output with af_packet: > > [ 4833.637460][] dump_stack+0x19/0x1b > > [ 4833.637474] [] > > ip_fragment.constprop.55+0xc3/0x141 [ 4833.637481] > > [] ip_finish_output+0x314/0x350 [ 4833.637484] > > [] ip_output+0xb3/0x130 [ 4833.637490] > > [] ?
Re: [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr
Hi, What's the value you're passing on gso_size in struct virtio_net_hdr? You need to leave room for the encapsulation header, e.g.: gso_size = iface_mtu - virtio_net_hdr->hdr_len fbl On Mon, 4 Nov 2019 01:11:36 +0530 Ramana Reddy wrote: > Hi, > I am wondering if anyone can help me with this. I am having trouble > to send tso/gso packet > with af_packet socket with packet_vnet_hdr (through virtio_net_hdr) > over vxlan tunnel in OVS. > > What I observed that, the following function eventually hitting and is > returning false (net/core/skbuff.c), hence the packet is dropping. > static inline bool skb_gso_size_check(const struct sk_buff *skb, > unsigned int seg_len, > unsigned int max_len) { > const struct skb_shared_info *shinfo = skb_shinfo(skb); > const struct sk_buff *iter; > if (shinfo->gso_size != GSO_BY_FRAGS) > return seg_len <= max_len; > .. > } > [ 678.756673] ip_finish_output_gso:235 packet_length:2762 (here > packet_length = skb->len - skb_inner_network_offset(skb)) > [ 678.756678] ip_fragment:510 packet length:1500 > [ 678.756715] ip_fragment:510 packet length:1314 > [ 678.956889] skb_gso_size_check:4474 and seg_len:1550 and > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535 > > Observation: > When we send the large packet ( example here is packet_length:2762), > its showing the seg_len(1550) > max_len(1500). Hence return seg_len > <= max_len statement returning false. > Because of this, ip_fragment calling icmp_send(skb, ICMP_DEST_UNREACH, > ICMP_FRAG_NEEDED, htonl(mtu)); rather the code reaching to > ip_finish_output2(sk, skb) > function in net/ipv4/ip_output.c and is given below: > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb, > unsigned int mtu) > { > netdev_features_t features; > struct sk_buff *segs; > int ret = 0; > > /* common case: seglen is <= mtu */ > if (skb_gso_validate_mtu(skb, mtu)) > return ip_finish_output2(sk, skb); >... > err = ip_fragment(sk, segs, mtu, ip_finish_output2); > ... > } > > But when we send normal iperf traffic ( gso/tso traffic) over vxlan, > the skb_gso_size_check returning a true value, and ip_finish_output2 > getting executed. > Here is the values of normal iperf traffic over vxlan. > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 > > Can someone help me to solve what is missing, and where should I > modify the code in OVS/ or outside of ovs, so that it works as > expected. > > Thanks in advance. > > Some more info: > [root@xx ~]# uname -r > 3.10.0-1062.4.1.el7.x86_64 > [root@xx ~]# cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.7 (Maipo) > > [root@xx]# ovs-vsctl --version > ovs-vsctl (Open vSwitch) 2.9.0 > DB Schema 7.15.1 > > And dump_stack output with af_packet: > [ 4833.637460][] dump_stack+0x19/0x1b > [ 4833.637474] [] > ip_fragment.constprop.55+0xc3/0x141 [ 4833.637481] > [] ip_finish_output+0x314/0x350 [ 4833.637484] > [] ip_output+0xb3/0x130 [ 4833.637490] > [] ? ip_do_fragment+0x910/0x910 [ 4833.637493] > [] ip_local_out_sk+0xf9/0x180 [ 4833.637497] > [] iptunnel_xmit+0x18c/0x220 [ 4833.637505] > [] udp_tunnel_xmit_skb+0x117/0x130 [udp_tunnel] > [ 4833.637538] [] vxlan_xmit_one+0xb6a/0xb70 > [vxlan] [ 4833.637545] [] ? > vprintk_default+0x29/0x40 [ 4833.637551] [] > vxlan_xmit+0xc9e/0xef0 [vxlan] [ 4833.637555] [] ? > kfree_skbmem+0x37/0x90 [ 4833.637559] [] ? > consume_skb+0x34/0x90 [ 4833.637564] [] ? > packet_rcv+0x4c/0x3e0 [ 4833.637570] [] > dev_hard_start_xmit+0x246/0x3b0 [ 4833.637574] [] > __dev_queue_xmit+0x519/0x650 [ 4833.637580] [] ? > try_to_wake_up+0x190/0x390 [ 4833.637585] [] > dev_queue_xmit+0x10/0x20 [ 4833.637592] [] > ovs_vport_send+0xa6/0x180 [openvswitch] [ 4833.637599] > [] do_output+0x4e/0xd0 [openvswitch] [ 4833.637604] > [] do_execute_actions+0xa29/0xa40 [openvswitch] > [ 4833.637610] [] ? __wake_up_common+0x82/0x120 > [ 4833.637615] [] ovs_execute_actions+0x4c/0x140 > [openvswitch] > [ 4833.637621] [] ovs_dp_process_packet+0x84/0x120 > [openvswitch] > [ 4833.637627] [] ? ovs_ct_update_key+0xc4/0x150 > [openvswitch] > [ 4833.637633] [] ovs_vport_receive+0x73/0xd0 > [openvswitch] > [ 4833.637638]
Re: [ovs-discuss] OVS DPDK: Failed to create memory pool for netdev
It would be nice if you share the DPDK options used in OvS. On Sat, 2 Nov 2019 15:43:18 + "Tobias Hofmann \(tohofman\) via discuss" wrote: > Hello community, > > My team and I observe a strange behavior on our system with the > creation of dpdk ports in OVS. We have a CentOS 7 system with > OpenvSwitch and only one single port of type ‘dpdk’ attached to a > bridge. The MTU size of the DPDK port is 9216 and the reserved > HugePages for OVS are 512 x 2MB-HugePages, e.g. 1GB of total HugePage > memory. > > Setting everything up works fine, however after I reboot my box, the > dpdk port is in error state and I can observe this line in the logs > (full logs attached to the mail): > 2019-11-02T14:46:16.914Z|00437|netdev_dpdk|ERR|Failed to create > memory pool for netdev dpdk-p0, with MTU 9216 on socket 0: Invalid > argument 2019-11-02T14:46:16.914Z|00438|dpif_netdev|ERR|Failed to set > interface dpdk-p0 new configuration > > I figured out that by restarting the openvswitch process, the issue > with the port is resolved and it is back in a working state. However, > as soon as I reboot the system a second time, the port comes up in > error state again. Now, we have also observed a couple of other > workarounds that I can’t really explain why they help: > > * When there is also a VM deployed on the system that is using > ports of type ‘dpdkvhostuserclient’, we never see any issues like > that. (MTU size of the VM ports is 9216 by the way) > * When we increase the HugePage memory for OVS to 2GB, we also > don’t see any issues. > * Lowering the MTU size of the ‘dpdk’ type port to 1500 also > helps to prevent this issue. > > Can anyone explain this? > > We’re using the following versions: > Openvswitch: 2.9.3 > DPDK: 17.11.5 > > Appreciate any help! > Tobias ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVS deleting flows from the datapath on exit
On Fri, 1 Nov 2019 13:35:07 -0700 Ben Pfaff wrote: > OVS currently can gracefully exit in two ways: either with or without > deleting the datapath. But, either way, it deletes all of the flows > from the datapath before it exits. That is due to commit e96a5c24e853 > ("upcall: Remove datapath flows when setting n-threads."), which was > first released in OVS 2.1 back in 2014. > > This isn't usually a big deal. However, some controller folks I'm > talking to are concerned about upgrade. If the datapath flows > persisted after OVS exits, then existing network connections (and > perhaps some that are "similar" to them because they match the same > megaflows) could carry on while the upgrade was in progress. > > I am surprised that I have not heard complaints about this in the 5 > years that the behavior has been this way. Does anyone have any > stories to report about it now that I bring it up? Contrariwise, if > we changed OVS so that it did not delete datapath flows on exit, can > anyone suggest what problems that might cause? Well, I heard complains about updating OvS package causing a long downtime in OSP environments mainly because all the flows needed to be rebuilt at OSP side which was a slow process. When a service is restarted, it is expected to come up with a "clean and fresh state" and so far flows were seen as "temporary" data. In order to provide the option to restore the flows, the following commit was introduced to create a "reload" service: commit ea36b04688f37cf45b7c2304ce31f0d29f212d54 Author: Timothy Redaelli Date: Fri Nov 3 21:39:17 2017 +0100 rhel: Add support for "systemctl reload openvswitch" Now the openvswitch service could be restarted with flows persisting. There was also an investigation to preserve the kernel datapath cache during the reload to be less disruptive as possible. However after the above commit I never heard about package update issues again, so we dropped the kernel datapath persistent effort. fbl ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] the network performence is not normal when use openvswitch.ko make from ovs tree
Hi: I make rpm packages for ovs and ovn with this document: http://docs.openvswitch.org/en/latest/intro/install/fedora/ . For use the kernel module in ovs tree, i configure with the command: ./configure --with-linux=/lib/modules/$(uname -r)/build . Then install the rpm packages. when i finished, i check the openvswitch.ko is like: # lsmod | grep openvswitch openvswitch 291276 0 tunnel6 3115 1 openvswitch nf_defrag_ipv6 25957 2 nf_conntrack_ipv6,openvswitch nf_nat_ipv6 6459 2 openvswitch,ip6table_nat nf_nat_ipv4 6187 2 openvswitch,iptable_nat nf_nat 18080 5 xt_nat,openvswitch,nf_nat_ipv6,nf_nat_masquerade_ipv4,nf_nat_ipv4 nf_conntrack 102766 10 ip_vs,nf_conntrack_ipv6,openvswitch,nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_ipv6,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_nat libcrc32c 1388 3 ip_vs,openvswitch,xfs ipv6 400397 92 ip_vs,nf_conntrack_ipv6,openvswitch,nf_defrag_ipv6,nf_nat_ipv6,bridge # modinfo openvswitch filename: /lib/modules/4.9.18-19080201/extra/openvswitch/openvswitch.ko alias: net-pf-16-proto-16-family-ovs_ct_limit alias: net-pf-16-proto-16-family-ovs_meter alias: net-pf-16-proto-16-family-ovs_packet alias: net-pf-16-proto-16-family-ovs_flow alias: net-pf-16-proto-16-family-ovs_vport alias: net-pf-16-proto-16-family-ovs_datapath version:2.11.2 license:GPL description:Open vSwitch switching datapath srcversion: 9DDA327F9DD46B9813628A4 depends: nf_conntrack,tunnel6,ipv6,nf_nat,nf_defrag_ipv6,libcrc32c,nf_nat_ipv6,nf_nat_ipv4 vermagic: 4.9.18-19080201 SMP mod_unload modversions parm: udp_port:Destination UDP port (ushort) # rpm -qf /lib/modules/4.9.18-19080201/extra/openvswitch/openvswitch.ko openvswitch-kmod-2.11.2-1.el7.x86_64 Then i start to build my network structure. I have two node,and network namespace vm1 on node1, network namespace vm2 on node2. vm1's veth pair veth-vm1 is on node1's br-int. vm2's veth pair veth-vm2 is on node2's br-int. In logical layer, there is one logical switch test-subnet and two logical switch port node1 and node2 on it. like this: # ovn-nbctl show switch 70585c0e-3cd9-459e-9448-3c13f3c0bfa3 (test-subnet) port node2 addresses: ["00:00:00:00:00:02 192.168.100.20"] port node1 addresses: ["00:00:00:00:00:01 192.168.100.10"] on node1: # ovs-vsctl show 5180f74a-1379-49af-b265-4403bd0d82d8 Bridge br-int fail_mode: secure Port "ovn-431b9e-0" Interface "ovn-431b9e-0" type: geneve options: {csum="true", key=flow, remote_ip="10.18.124.2"} Port br-int Interface br-int type: internal Port "veth-vm1" Interface "veth-vm1" ovs_version: "2.11.2" # ip netns exec vm1 ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 14: ovs-gretap0@NONE: mtu 1462 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 15: erspan0@NONE: mtu 1450 qdisc noop state DOWN group default qlen 1000 link/ether 22:02:1b:08:ec:53 brd ff:ff:ff:ff:ff:ff 16: ovs-ip6gre0@NONE: mtu 1448 qdisc noop state DOWN group default qlen 1 link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 17: ovs-ip6tnl0@NONE: mtu 1452 qdisc noop state DOWN group default qlen 1 link/tunnel6 :: brd :: 18: vm1-eth0@if17: mtu 1400 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.100.10/24 scope global vm1-eth0 valid_lft forever preferred_lft forever inet6 fe80::200:ff:fe00:1/64 scope link valid_lft forever preferred_lft forever on node2:# ovs-vsctl show 011332d0-78bc-47f7-be3c-fab0beb08e28 Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal Port "ovn-c655f8-0" Interface "ovn-c655f8-0" type: geneve options: {csum="true", key=flow, remote_ip="10.18.124.1"} Port "veth-vm2" Interface "veth-vm2" ovs_version: "2.11.2" #ip netns exec vm2 ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 10: ovs-gretap0@NONE: mtu 1462 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 11: erspan0@NONE: mtu