We have a lot of tracing tools, at multiple levels.  I wonder whether
they are documented adequately.  I tried to document them by example in
the OVN OpenStack tutorial
(e.g. http://docs.openvswitch.org/en/latest/tutorials/ovn-openstack/),
but it is not exactly a how-to guide.  Maybe someone could find the time
to write a high-level guide to how to trace a packet through the system.
With plenty of examples, it could be a great resource.

On Fri, Mar 15, 2019 at 05:25:09PM +0200, Daniel Alvarez Sanchez wrote:
> Sounds like a great plan, Ben! Thanks for that. It'd be great if
> people could chime in this thread to help identify those gaps.
> 
> As about the anecdotes, we had just been involved in a case where OVN
> was used and packets were dropped at conntrack:
> 
> Two VMs on different Logical Switches (externally routed), running on
> the same hypervisor were communicating between each other and packet
> loss was observed. The packet loss was observed only on small (<64B)
> packets. These packets were padded by the NIC before being put on the
> wire and when they came back, due to the ACLs, they were put into
> conntrack and dropped there. We determined this by inspecting DP flows
> via 'ovs-dpctl dump-flows' and then we enabled logging on netfilter
> which showed that there was an error with the checksum calculation. It
> happened to be a bug on the OVS kernel side which was already fixed in
> newer kernels but it took quite a while to figure out and a good
> understanding on what was going on. In this scenario, if OVN ACLs were
> removed, traffic worked so OVN was the first to be blamed. And
> sometimes, the OVN user/engineer is not an OVS expert to be able to
> tell effectively what happened to a packet.
> 
> Maybe the example is not the best as it was resolved using just the
> 'ovs-dpctl' tool and some logging but support engineers may loop in
> OVN engineers which may loop in OVS engineers which may loop in kernel
> engineers. It'd be great to improve the experience somehow so that the
> initial assessment doesn't have to go always all the way down.
> 
> I'm curious about other folks' experiences here as well with more pure
> OVS experience.
> 
> Thanks a lot!
> Daniel
> 
> On Thu, Mar 14, 2019 at 5:55 PM Ben Pfaff <b...@ovn.org> wrote:
> >
> > On Thu, Mar 14, 2019 at 04:55:56PM +0100, Daniel Alvarez Sanchez wrote:
> > > Hi folks,
> > >
> > > Lately I'm getting the question in the subject line more and more
> > > frequently and facing it myself, especially in the context of
> > > OpenStack.
> > >
> > > The shift to OVN in OpenStack involves a totally different approach
> > > when it comes to tracing packet drops. Before OVN, there were a bunch
> > > of network namespaces and devices where you could hook a tcpdump on
> > > and inspect the traffic. People are used to those troubleshooting
> > > techniques and OVS was merely used for normal action switches.
> > >
> > > It's clear that there's tools and techniques to analyze this (trace
> > > tool, port mirroring, etc.), but often times requires quite high
> > > knowledge and understanding of the pipeline and OVS itself to
> > > effectively trace where a packet got dropped. Furthermore, there could
> > > be some scenarios where the packet can be silently dropped.
> > >
> > > I came across this patch [0] and presentation about it [1] which aims
> > > to tackle partly the problem described here (focusing in the DPDK
> > > datapath).
> > >
> > > The intent of this email is to gather some feedback as how to provide
> > > efficient tools and techniques to troubleshoot OVS/OVN issues and what
> > > do you think is immediately missing in this context.
> >
> > I guess that there are multiple things to do here:
> >
> > - Better document the tools that are available.
> >
> > - Implement improvements, especially UX-wise, to the existing tools.
> >
> > - Identify gaps in the available tools (and then fill them).
> >
> > Do you have any good anecdotes about user/admin frustration?  They might
> > be helpful for figuring out how to help.  A lot of us here designed and
> > built this stuff and so the gaps are not always obvious to us.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to