Today (*), when a packet journey in the data path is disrupted and leading towards its drop, we have OVS counters to auto-detect it and show at the request of user space commands. Some category of drops are related to the interfaces that can be queried from OVS DB table for that interface [2], while some are available in real time in the data path through respective OVS commands (eg, ovs-appctl coverage/show as in [3] and ovs-appctl dpctl/show as in [4]). It is unavoidable that the drop stats is split across multiple sources, but at the end of the day the user has to query by different ways to figure out: (1) there is packet drop (2) reason for the drop (3) miss precious opportunity to correct available resources in the data path to prevent further drops.
To ease the difficulty in monitoring these data, we already have tool such as collectd [1] to record the events but IMHO there is slight async between what we have today and what we develop in our upstream, meaning collectd can know packet drops only in the context of interface table. However, the other category of drops (related to QoS, metering, tunnel, up call, re-circulation, mtu mismatch and even invalid packet etc) can not be monitored by collectd because, neither the association with the Interface table nor a separate table itself exist today. However, there is an indirect association for eg Flow_Table represents all the packet flow rules, and when a packet is dropped, it can only be checked in Flow_Table for any drop action but it is not unified attempt to quickly detect and correct resources. Thanks to our developers that these drops are someway recorded now but, in the field, the time to recover from the drops easily elapses given that, these stats first to be collected, be analysed by experts and then recovery action be applied. Also, there could be a pressing need to have very very minimal packet drops per million (ppm) . Hence, I would like to request suggestions from experts for how we can handle this situation through OVS and my humble ideas are below. (1) Unify the data collection into a common place: We can think of having a separate Data path table to record necessary contexts of a packet (drop reason and its count to start with). This will lead very minimal changes in the eco-system like collectd to sync. Work around until then is to continue using existing tables where ever possible, with additional statistics row if not exist. (2) Notify drop very soon or never! Instead of detecting DB records update (even after (1) above) with some latency in DB transactions to be in sync with real time data, why not OVS generate events to the consuming eco-system pro-actively ? I can think of D-bus for an instance to broadcast packet drop notifications. As a disclaimer, I'm not d-bus expert :) but it is just an idea to brainstorm. An analogy in terms of cli (though using its library it is good): <broadcasting event for every packet may be too much exhausting resources in the notification chain instead, follow guidelines set by the user. eg above allowable drop ppm ?? or even wait for signal to enable broadcast from registered monitoring agent in dbus). OVS: dbus-send --system --dest=net.ovsmon/net.ovsmon.Datapath.SetProperty string:Qfull variant:string:<port_name_that_packet_arrived> Monitor: dbus-monitor type=signal interface="net.ovsmon.Datapath" signal sender=net.ovsmon.Datapath -> dest=:1.102 path=/net/ovsmon/Datapath; interface=net.ovsmon.Datapath; member=Qfull string "vhost-port-1" Monitor: dbus-send --system --dest=net.ovsmon/net.ovsmon.Interface.SetProperty string:<port_name> variant:string:"queue_size=<new value>" OVS: <to monitor and apply corrective action> If you think this sounds good, I can further think on prototyping it for a better demonstration or if it is other way, please suggest any better approach as well. * Below patches are in upstream as accepted/under review at present: [1] https://wiki.opnfv.org/display/fastpath/Open+vSwitch+plugins+High+Level+Design [2] https://patchwork.ozlabs.org/patch/1123287/ [3] https://patchwork.ozlabs.org/patch/1111568/ [4] https://patchwork.ozlabs.org/patch/1115978/ [5] http://www.openvswitch.org//ovs-vswitchd.conf.db.5.pdf Respective developers from above mail chains are CC'd however, others are also more welcome. Also, I think it is ovs-dev as appropriate ML for this discussion. Kind regards, Gowrishankar M _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev