Hi Pravin,

Thanks. Does this mean it is a confirmed bug?

How would I be able to get the patch and install it into our environment?

Thanks,
Uri


On Sat, Jan 7, 2017 at 1:01 PM, Pravin Shelar <pshe...@ovn.org> wrote:

> Thanks for all investigation.
>
> On Sat, Jan 7, 2017 at 12:57 AM, Joe Stringer <j...@ovn.org> wrote:
> >
> >
> > On 5 January 2017 at 19:24, Uri Foox <u...@zoey.com> wrote:
> >>
> >> Hey Joe,
> >>
> >> Thank you so much for responding! After 10 days of trying to figure this
> >> out I'm at a loss.
> >>
> >> root@node-8:~# modinfo openvswitch
> >> filename:
> >> /lib/modules/3.13.0-106-generic/kernel/net/openvswitch/openvswitch.ko
> >> license:        GPL
> >> description:    Open vSwitch switching datapath
> >> srcversion:     94294A72258BA583D666607
> >> depends:        libcrc32c,vxlan,gre
> >> intree:         Y
> >
> >
> > ^ intree - that is, the version that comes with this kernel.
> >
> >>
> >> vermagic:       3.13.0-106-generic SMP mod_unload modversions
> >>
> >>
> >> Everything you've mentioned is what I've understood so far including the
> >> line of code that's triggered. That is what led me to upgrade the
> kernel to
> >> 3.13.0-106 because it claims that the CHECKSUM problems are fixed which
> I
> >> thought this might be related, guess not.
> >
> >
> > I forgot to actually look through those before, but the call chain looks
> a
> > bit different there so I thought it may be a different issue altogether.
> >
> >>
> >> You're saying that skb_headlen is too short for the ethernet header. Do
> >> you know what would cause this? This hardware configuration has been
> running
> >> for 400+ days of uptime with no errors or problems and this suddenly
> started
> >> to happen and no matter how many time we reboot things it doesn't go
> away.
> >> I assume given your interpretation we should try to restart the switches
> >> connected to the servers. Is there any way to log what packet is causing
> >> this issue? Perhaps that would provide more insight?
> >
> >
> > One thing is that it depends on the packets and how they arrive. I'm not
> too
> > familiar with this code, but I could imagine a situation where the IP+GRE
> > packet gets fragmented, causing a single inner frame to be split across
> > muliple GRE packets. Then, when Linux receives the two separate packets,
> > there would be some point in the stack responsible for stitching these
> > packets back together; but it may not put them into a single contiguous
> > buffer. If this is subsequently decapped for local delivery of the inner
> > frame, then perhaps there is less than an ethernet header's worth of
> packet
> > in the first of these buffers. It seems unlikely that packets would be
> > deliberately fragmented like this, but if anyone had access to your
> > underlying network then they could throw any kind of packet they want to
> > your server.
> >
> > There may be another, more likely, explanation - CC Pravin in case he has
> > any ideas.
> >
> >>
> >> As far as 4.4/newer kernel - I wish. I tried to go that far up but
> Ubuntu
> >> wouldn't even boot. The best I could do is 3.13.0-106. I'll try to
> report it
> >> over there as well.
> >
> >
> > That's too bad.
> >
> > FWIW, I see a check for pskb_may_pull() in the outer gre_rcv function,
> which
> > would check on the whole GRE packet.. this is then passed to
> gre_cisco_rcv()
> > which does the decap and calls through to the OVS gre_rcv() function. At
> a
> > glance, following the OVS' gre_rcv() I didn't see another
> psukb_may_pull()
> > check for the inner packet. By the time it gets to ovs_flow_extract(),
> > there's an expectation that this call was made but I'm really not sure
> who
> > was supposed to make that check. Also, it should be ETH_HLEN, which is
> 14,
> > not 12..
> >
> Right. OVS do expect the-header already in skb linear data. It is done
> in iptunnel_pull_header() for tunnel packets. This function is called
> for all packets received in GRE module.
>
> http://lxr.free-electrons.com/source/net/ipv4/ip_tunnel_core.c?v=3.13#L96
>
> But the skb eth-header is only pulled for GRE-TAP packets not for
> IP-GRE. The change in network could have introduced these IP-GRE
> packets that caused the crash.
>
> This bug does not exist in out of tree kernel module that come with
> OVS 2.5 and newer. So upgrading OVS kernel module to 2.5 should solve
> the problem.
>
> I will sent out a patch for older OVS kernel module.
>
> > Outer gre_rcv():
> > http://lxr.free-electrons.com/source/net/ipv4/gre_demux.c?v=3.13#L270
> >
> > Inner gre_rcv():
> > http://lxr.free-electrons.com/source/net/openvswitch/vport-
> gre.c?v=3.13#L92
>



-- 
Uri Foox | Zoey | Founder
http://www.zoey.com
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to