[ovs-dev] [RFE] Event mechanism for a pro-active packet drop detection and recovery

Gowrishankar Muthukrishnan Thu, 27 Jun 2019 22:41:00 -0700

Today (*), when a packet journey in the data path is disrupted and leading
towards its drop, we have OVS counters to auto-detect it and show at the
request of user space commands. Some category of drops are related to the
interfaces that can be queried from OVS DB table for that interface [2],
while some are available in real time in the data path through respective
OVS commands (eg, ovs-appctl coverage/show as in [3] and ovs-appctl
dpctl/show as in [4]). It is unavoidable that the drop stats is split
across multiple sources, but at the end of the day the user has to query
by different ways to figure out:
  (1) there is packet drop
  (2) reason for the drop
  (3) miss precious opportunity to correct available resources in the
      data path to prevent further drops.


To ease the difficulty in monitoring these data, we already have tool
such as collectd [1] to record the events but IMHO there is slight async
between what we have today and what we develop in our upstream, meaning
collectd can know packet drops only in the context of interface table.
However, the other category of drops (related to QoS, metering, tunnel,
up call, re-circulation, mtu mismatch and even invalid packet etc) can not
be monitored by collectd because, neither the association with
the Interface table nor a separate table itself exist today.

However, there is an indirect association for eg Flow_Table represents
all the packet flow rules, and when a packet is dropped, it can only be
checked in Flow_Table for any drop action but it is not unified attempt
to quickly detect and correct resources. Thanks to our developers that
these drops are someway recorded now but, in the field, the time to
recover from the drops easily elapses given that, these stats first to be
collected, be analysed by experts and then recovery action be applied.
Also, there could be a pressing need to have very very minimal packet
drops per million (ppm) .

Hence, I would like to request suggestions from experts for how we can
handle this situation through OVS and my humble ideas are below.

(1) Unify the data collection into a common place:
  We can think of having a separate Data path table to record necessary
  contexts of a packet (drop reason and its count to start with). This
  will lead very minimal changes in the eco-system like collectd to sync.
  Work around until then is to continue using existing tables where ever
  possible, with additional statistics row if not exist.

(2) Notify drop very soon or never!
  Instead of detecting DB records update (even after (1) above) with some
  latency in DB transactions to be in sync with real time data, why not OVS
  generate events to the consuming eco-system pro-actively ? I can think
  of D-bus for an instance to broadcast packet drop notifications.
  As a disclaimer, I'm not d-bus expert :) but it is just an idea to
  brainstorm.

  An analogy in terms of cli (though using its library it is good):

  <broadcasting event for every packet may be too much exhausting
  resources in the notification chain instead, follow guidelines set by
  the user. eg above allowable drop ppm ?? or even wait for signal to
  enable broadcast from registered monitoring agent in dbus).

  OVS: dbus-send --system --dest=net.ovsmon/net.ovsmon.Datapath.SetProperty
string:Qfull variant:string:<port_name_that_packet_arrived>
  Monitor: dbus-monitor type=signal interface="net.ovsmon.Datapath"
             signal sender=net.ovsmon.Datapath -> dest=:1.102
path=/net/ovsmon/Datapath; interface=net.ovsmon.Datapath; member=Qfull
string "vhost-port-1"

  Monitor: dbus-send --system
--dest=net.ovsmon/net.ovsmon.Interface.SetProperty string:<port_name>
variant:string:"queue_size=<new value>"
  OVS: <to monitor and apply corrective action>

If you think this sounds good, I can further think on prototyping it
for a better demonstration or if it is other way, please suggest any
better approach as well.

* Below patches are in upstream as accepted/under review at present:
[1]
https://wiki.opnfv.org/display/fastpath/Open+vSwitch+plugins+High+Level+Design
[2] https://patchwork.ozlabs.org/patch/1123287/
[3] https://patchwork.ozlabs.org/patch/1111568/
[4] https://patchwork.ozlabs.org/patch/1115978/
[5] http://www.openvswitch.org//ovs-vswitchd.conf.db.5.pdf

Respective developers from above mail chains are CC'd however, others are
also more welcome. Also, I think it is ovs-dev as appropriate ML for this
discussion.

Kind regards,
Gowrishankar M
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFE] Event mechanism for a pro-active packet drop detection and recovery

Reply via email to