On 22/10/2019 11:31, Kevin Traynor wrote: > On 22/10/2019 09:25, Sriram Vatala wrote: >> Hi Ilya & Kevin, >> Thanks for your suggestions. I am summarizing our discussion, please feel >> free >> to correct me, if I am wrong. >> > > Hi Sriram, > > Will share my thought, Ilya may have different view. > >> 1) All custom stats calculated in OVS will have the prefix , so that these >> stats will not intersect with the names of other stats (driver/HW etc). > > yes > >> 2) The prefix used for these custom stats is "cstat_". > > I think with 3) (note to explain the prefix) then 'sw_' or 'ovs_' would > look neater in the logs but i'm ok with any of them. > >> 3) Instead of documenting all the stats, I will add a generic note about the >> prefix used for custom stats in "Documentation/topics/dpdk/vhost-user.rst >> where "tx_retries" is documented. >> > > yes, but these stats are not vhost specific so I think they should go > into topics/dpdk/bridge.rst in the 'Extended & Custom Statistics' section. >
crossed mails, but the same comment :-) > Kevin. > >> Thanks & Regards, >> Sriram. >> >> -----Original Message----- >> From: Kevin Traynor <ktray...@redhat.com> >> Sent: 21 October 2019 20:22 >> To: Ilya Maximets <i.maxim...@ovn.org>; Sriram Vatala >> <srira...@altencalsoftlabs.com>; ovs-dev@openvswitch.org >> Cc: Stokes, Ian <ian.sto...@intel.com> >> Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics >> >> On 18/10/2019 13:59, Ilya Maximets wrote: >>> On 16.10.2019 19:48, Kevin Traynor wrote: >>>> On 16/10/2019 13:16, Ilya Maximets wrote: >>>>> Hi Kevin, >>>>> >>>>> Thanks for review. Some comments inline. >>>>> >>>>> On 16.10.2019 12:15, Kevin Traynor wrote: >>>>>> Hi Sriram, >>>>>> >>>>>> Thanks for working on making more fine grained stats for debugging. >>>>>> As mentioned yesterday, this patch needs rebase so I just reviewed >>>>>> docs and netdev-dpdk.c which applied. Comments below. >>>>>> >>>>>> On 21/09/2019 03:40, Sriram Vatala wrote: >>>>>>> OVS may be unable to transmit packets for multiple reasons and >>>>>>> today there is a single counter to track packets dropped due to >>>>>>> any of those reasons. The most common reason is that a VM is >>>>>>> unable to read packets fast enough causing the vhostuser port >>>>>>> transmit queue on the OVS side to become full. This manifests as a >>>>>>> problem with VNFs not receiving all packets. Having a separate >>>>>>> drop counter to track packets dropped because the transmit queue >>>>>>> is full will clearly indicate that the problem is on the VM side >>>>>>> and not in OVS. Similarly maintaining separate >>>>>> >>>>>> It reads like a bit of a contradiction. On one hand the "The *most >>>>>> common* reason is that a VM is unable to read packets fast enough". >>>>>> While having a stat "*will clearly indicate* that the problem is on >>>>>> the VM side". >>>>>> >>>>>>> counters for all possible drops helps in indicating sensible cause >>>>>>> for packet drops. >>>>>>> >>>>>>> This patch adds custom software stats counters to track packets >>>>>>> dropped at port level and these counters are displayed along with >>>>>>> other stats in "ovs-vsctl get interface <iface> statistics" >>>>>>> command. The detailed stats will be available for both dpdk and >>>>>>> vhostuser ports. >>>>>>> >>>>>> >>>>>> I think the commit msg could be a bit clearer, suggest something >>>>>> like below - feel free to modify (or reject): >>>>>> >>>>>> OVS may be unable to transmit packets for multiple reasons on the >>>>>> DPDK datapath and today there is a single counter to track packets >>>>>> dropped due to any of those reasons. >>>>>> >>>>>> This patch adds custom software stats for the different reasons >>>>>> packets may be dropped during tx on the DPDK datapath in OVS. >>>>>> >>>>>> - MTU drops : drops that occur due to a too large packet size >>>>>> - Qos drops: drops that occur due to egress QOS >>>>>> - Tx failures: drops as returned by the DPDK PMD send function >>>>>> >>>>>> Note that the reason for tx failures is not specificied in OVS. In >>>>>> practice for vhost ports it is most common that tx failures are >>>>>> because there are not enough available descriptors, which is >>>>>> usually caused by misconfiguration of the guest queues and/or >>>>>> because the guest is not consuming packets fast enough from the queues. >>>>>> >>>>>> These counters are displayed along with other stats in "ovs-vsctl >>>>>> get interface <iface> statistics" command and are available for >>>>>> dpdk and vhostuser/vhostuserclient ports. >>>>>> >>>>>> Signed-off-by: Sriram Vatala <srira...@altencalsoftlabs.com> >>>>>> >>>>>> --- >>>>>> >>>>>> v9:... >>>>>> v8:... >>>>>> >>>>>> >>>>>> Note that signed-off, --- and version info should be like this ^^^. >>>>>> otherwise the review version comments will be in the git commit log >>>>>> when it is applied. >>>>>> >>>>>>> -- >>>>>>> Changes since v8: >>>>>>> Addressed comments given by Ilya. >>>>>>> >>>>>>> Signed-off-by: Sriram Vatala <srira...@altencalsoftlabs.com> >>>>>>> --- >>>>>>> Documentation/topics/dpdk/vhost-user.rst | 13 ++- >>>>>>> lib/netdev-dpdk.c | 81 >>>>>>> +++++++++++++++---- >>>>>>> utilities/bugtool/automake.mk | 3 +- >>>>>>> utilities/bugtool/ovs-bugtool-get-port-stats | 25 ++++++ >>>>>>> .../plugins/network-status/openvswitch.xml | 1 + >>>>>>> 5 files changed, 105 insertions(+), 18 deletions(-) >>>>>>> create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats >>>>>>> >>>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst >>>>>>> b/Documentation/topics/dpdk/vhost-user.rst >>>>>>> index 724aa62f6..89388a2bf 100644 >>>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst >>>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst >>>>>>> @@ -551,7 +551,18 @@ processing packets at the required rate. >>>>>>> The amount of Tx retries on a vhost-user or vhost-user-client >>>>>>> interface can be >>>>>>> shown with:: >>>>>>> >>>>>>> - $ ovs-vsctl get Interface dpdkvhostclient0 >>>>>>> statistics:tx_retries >>>>>>> + $ ovs-vsctl get Interface dpdkvhostclient0 >>>>>>> + statistics:netdev_dpdk_tx_retries >>>>>> >>>>>> I think the "netdev_dpdk_" prefixes should be removed for a few >>>>>> reasons, >>>>>> >>>>>> - These are custom stats that will only be displayed for the dpdk >>>>>> ports so they don't need any extra prefix to say they are for dpdk >>>>>> ports >>>>>> >>>>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change >>>>>> its name to a different one in OVS 2.13 unless there is a very good >>>>>> reason >>>>>> >>>>>> - The existing stats don't have this type of prefixes (granted most >>>>>> of them are general stats): >>>>> >>>>> The main reason for the prefix for me is to distinguish those stats >>>>> from the stats that comest from the driver/HW. Our new stats has >>>>> very generic names that could be named a lot like or even equal to >>>>> HW stats. This might be not an issue for vhostuser, but for HW NICs >>>>> this may be really confusing for users. >>>>> >>>>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way >>>>> to clearly distinguish those stats. We may use different prefixes >>>>> like 'sw_' or just 'ovs_'. >>>>> >>>> >>>> ok, I can see that we might want to distinguish custom stats to >>>> indicate that they are specific for that device but that also has the >>>> side effect of making them less self-descriptive and the user will >>>> have to be told somewhere what the prefix means. >>>> >>>> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be >>>> confusing too, as for example in DPDK case, Tx drops are calculated >>>> in sw/ovs the same as qos/mtu/txfails but won't have that prefix. If >>>> we really need a prefix 'cstat_' is the best I can think of. >>> >>> What we're trying to solve here is to distinguish stats that counted >>> internally in OVS from the stats provided by the third parties. >>> Just to not have same/similar names >>> >>> Usual stats from 'netdev_stats' are combined from various sources. >>> For example, 'rx_dropped' in netdev-dpdk is a combination of our QoS >>> dorps from OVS and HW counters like rte_stats.rx_nombuf. >>> For me it looks like for these stats we have kind of best-effort >>> policy to provide as full info as possible. Trying to describe how >>> each of these counters calculated doesn't make sense because we do not >>> know the origin of many of the components that comes from DPDK or HW. >>> >>>> >>>>>> # ovs-vsctl get Interface dpdkvhost0 statistics >>>>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0, >>>>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176, >>>>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0, >>>>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0, >>>>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, >>>>>> tx_dropped=0, tx_packets=63830432, tx_retries=0} >>>>>> >>>>>> Also, just to note that if there are changes to existing >>>>>> interfaces/behaviour it should really mention that in the commit >>>>>> message so it is highlighted. >>>>>> >>>>>>> + >>>>>>> +When the guest is not able to consume the packets fast enough, >>>>>>> +the transmit queue of the port gets filled up i.e queue runs out of >>>>>>> free descriptors. >>>>>>> +This is the most likely reason why dpdk transmit API will fail to >>>>>>> +send packets besides other reasons. >>>>>>> + >>>>>>> +The amount of tx failure drops on a dpdk vhost/physical interface >>>>>>> +can be shown with:: >>>>>>> + >>>>>>> + $ ovs-vsctl get Interface dpdkvhostclient0 \ >>>>>>> + >>>>>>> + statistics:netdev_dpdk_tx_failure_drops >>>>>>> >>>>>> >>>>>> The commit msg/docs are only focusing on one stat for vhost ports, >>>>>> but there are other stats and dpdk ports, so they should all get some >>>>>> mention. >>>>> >>>>> I don't feel comfortable for documenting custom stats. This is just >>>>> because we can't describe them all. These are things to be changed over >>>>> time. >>>>> And we don't really know what some types of stats means and if they >>>>> means the same for different types of HW NICs (they are definitely >>>>> not) or OVS netdevs (even within netdev-dpdk). >>>>> And that is one more reason to have a prefix for OVS internal >>>>> statistics on which we have at least partial control. >>>>> >>>>> I think, that user documentation could mention some special >>>>> statistics while describing the troubleshooting for some hard >>>>> special cases, but this should not be a glossary of all the possible >>>>> custom stats. From my point of view, names of the stats should be >>>>> as possible self-descriptive and should not require additional >>>>> documentation. >>>>> >>>> >>>> For sure any prefix would need to be explained because that part >>>> would not be intuitive. >>> >>> We could add a note to the common DPDK guide about that in a single >>> sentence: >>> "There are custom statistics that OVS accumulates itself and these >>> stats has 'your_prefix_here_' as a prefix." >>> >> >> yes, that is what I had in mind. It clarifies that it is a custom stat which >> explains why something like vhost tx_dropped does not have this prefix even >> though it is also counted in ovs/sw/netdev_dpdk. >> >> Anyway, I don't want Sriram to be held up over one line of documentation, so >> i'm ok with whatever is decided. >> >> thanks, >> Kevin. >> >>> 'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks >>> self-descriptive enough to not do that. >>> >>>> >>>> thanks, >>>> Kevin. >>>> >>>>> BTW, you will not find any description for statistics provided by >>>>> the linux or DPDK drivers. You could only look at the driver code or >>>>> device spec. >>>>> >>>>> Best regards, Ilya Maximets. >>>>> >>>> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev