Hi Pravin, Thanks. Does this mean it is a confirmed bug?
How would I be able to get the patch and install it into our environment? Thanks, Uri On Sat, Jan 7, 2017 at 1:01 PM, Pravin Shelar <pshe...@ovn.org> wrote: > Thanks for all investigation. > > On Sat, Jan 7, 2017 at 12:57 AM, Joe Stringer <j...@ovn.org> wrote: > > > > > > On 5 January 2017 at 19:24, Uri Foox <u...@zoey.com> wrote: > >> > >> Hey Joe, > >> > >> Thank you so much for responding! After 10 days of trying to figure this > >> out I'm at a loss. > >> > >> root@node-8:~# modinfo openvswitch > >> filename: > >> /lib/modules/3.13.0-106-generic/kernel/net/openvswitch/openvswitch.ko > >> license: GPL > >> description: Open vSwitch switching datapath > >> srcversion: 94294A72258BA583D666607 > >> depends: libcrc32c,vxlan,gre > >> intree: Y > > > > > > ^ intree - that is, the version that comes with this kernel. > > > >> > >> vermagic: 3.13.0-106-generic SMP mod_unload modversions > >> > >> > >> Everything you've mentioned is what I've understood so far including the > >> line of code that's triggered. That is what led me to upgrade the > kernel to > >> 3.13.0-106 because it claims that the CHECKSUM problems are fixed which > I > >> thought this might be related, guess not. > > > > > > I forgot to actually look through those before, but the call chain looks > a > > bit different there so I thought it may be a different issue altogether. > > > >> > >> You're saying that skb_headlen is too short for the ethernet header. Do > >> you know what would cause this? This hardware configuration has been > running > >> for 400+ days of uptime with no errors or problems and this suddenly > started > >> to happen and no matter how many time we reboot things it doesn't go > away. > >> I assume given your interpretation we should try to restart the switches > >> connected to the servers. Is there any way to log what packet is causing > >> this issue? Perhaps that would provide more insight? > > > > > > One thing is that it depends on the packets and how they arrive. I'm not > too > > familiar with this code, but I could imagine a situation where the IP+GRE > > packet gets fragmented, causing a single inner frame to be split across > > muliple GRE packets. Then, when Linux receives the two separate packets, > > there would be some point in the stack responsible for stitching these > > packets back together; but it may not put them into a single contiguous > > buffer. If this is subsequently decapped for local delivery of the inner > > frame, then perhaps there is less than an ethernet header's worth of > packet > > in the first of these buffers. It seems unlikely that packets would be > > deliberately fragmented like this, but if anyone had access to your > > underlying network then they could throw any kind of packet they want to > > your server. > > > > There may be another, more likely, explanation - CC Pravin in case he has > > any ideas. > > > >> > >> As far as 4.4/newer kernel - I wish. I tried to go that far up but > Ubuntu > >> wouldn't even boot. The best I could do is 3.13.0-106. I'll try to > report it > >> over there as well. > > > > > > That's too bad. > > > > FWIW, I see a check for pskb_may_pull() in the outer gre_rcv function, > which > > would check on the whole GRE packet.. this is then passed to > gre_cisco_rcv() > > which does the decap and calls through to the OVS gre_rcv() function. At > a > > glance, following the OVS' gre_rcv() I didn't see another > psukb_may_pull() > > check for the inner packet. By the time it gets to ovs_flow_extract(), > > there's an expectation that this call was made but I'm really not sure > who > > was supposed to make that check. Also, it should be ETH_HLEN, which is > 14, > > not 12.. > > > Right. OVS do expect the-header already in skb linear data. It is done > in iptunnel_pull_header() for tunnel packets. This function is called > for all packets received in GRE module. > > http://lxr.free-electrons.com/source/net/ipv4/ip_tunnel_core.c?v=3.13#L96 > > But the skb eth-header is only pulled for GRE-TAP packets not for > IP-GRE. The change in network could have introduced these IP-GRE > packets that caused the crash. > > This bug does not exist in out of tree kernel module that come with > OVS 2.5 and newer. So upgrading OVS kernel module to 2.5 should solve > the problem. > > I will sent out a patch for older OVS kernel module. > > > Outer gre_rcv(): > > http://lxr.free-electrons.com/source/net/ipv4/gre_demux.c?v=3.13#L270 > > > > Inner gre_rcv(): > > http://lxr.free-electrons.com/source/net/openvswitch/vport- > gre.c?v=3.13#L92 > -- Uri Foox | Zoey | Founder http://www.zoey.com _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev