> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson > Sent: Monday, April 20, 2020 3:19 PM > > On Fri, Apr 10, 2020 at 08:06:23PM +0200, Morten Brørup wrote: > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wiles, Keith > > > Sent: Friday, April 10, 2020 4:22 PM > > > > > > > On Apr 10, 2020, at 5:49 AM, Morten Brørup > <m...@smartsharesystems.com> > > > wrote: > > > > > > > >> From: Ciara Power [mailto:ciara.po...@intel.com] > > > >> Sent: Wednesday, April 8, 2020 6:50 PM > > > >> > > > >> This patchset extensively reworks the telemetry library adding > new > > > >> functionality and simplifying much of the existing code, while > > > >> maintaining backward compatibility. > > > >> > > > >> This work is based on the previously sent RFC for a "process > info" > > > >> library: > https://patchwork.dpdk.org/project/dpdk/list/?series=7741 > > > >> However, rather than creating a new library, this patchset takes > > > >> that work and merges it into the existing telemetry library, as > > > >> mentioned above. > > > >> > > > >> The telemetry library as shipped in 19.11 is based upon the > metrics > > > >> library and outputs all statistics based on that as a source. > However, > > > >> this limits the telemetry output to only port-level statistics > > > >> information, rather than allowing it to be used as a general > scheme for > > > >> telemetry information across all DPDK libraries. > > > >> > > > >> With this patchset applied, rather than the telemetry library > being > > > >> responsible for pulling ethdev stats and pushing them into the > metrics > > > >> library for retrieval later, each library e.g. ethdev, rawdev, > and even > > > >> the metrics library itself (for backwards compatiblity) now > handle > > > >> their > > > >> own stats. Any library or app can register a callback function > with > > > >> telemetry, which will be called if requested by the client > connected > > > >> via > > > >> the telemetry socket. > > > > > > > > Great. Standardization across libraries is a good improvement. > > > > > > > >> The callback function in the library/app then > > > >> formats its stats, or other data, into a JSON string, and > returns it to > > > >> telemetry to be sent to the client. > > > > > > > > I am strongly opposed to using JSON as the standard format in > DPDK, and > > > instead prefer a binary format with zero (or minimal) type > conversion. > > > > > > > > Here is one reason why I dislike JSON for this: A part of our > application > > > samples 100k+ counters every second to be able to provide drill- > down > > > statistics through the GUI. Converting these counters from uint64_t > to JSON > > > and back to uint64_t for data processing is not going to fly. And I > assume > > > that we are not the only company developing equipment with drill- > down > > > statistics. > > > > > > > > I am aware that there is a difference between statistics for > *drill-down > > > and data processing* purposes and statistics for *telemetry eyeball > viewing > > > only* purposes, but the line is blurry, and I see a big risk of > setting a > > > path that leads to JSON being used in places where it shouldn't. > > > > > > > > Here is another reason why I dislike JSON for this: JSON is fine > for the > > > LAMP stack with REST protocols. But other systems use other > protocols with > > > other formats, e.g. the TICK stack uses an even simpler text based > format. > > > So DPDK based systems supporting the TICK stack will need to > convert to > > > first JSON format (in the DPDK libraries), and then from JSON > format to > > > InfluxDB format (in the DPDK application). > > > > > > > > I think that type conversion does not belong inside deep inside > the DPDK > > > libraries, but is a job for the DPDK application. However, DPDK > could > > > provide libraries for efficient bulk conversion to popular formats > like > > > JSON. And other formats, if they are relevant, e.g. ASN.1 used by > old > > > school SNMP. > > > > > > I believe JSON has it place in this library and in DPDK as it is a > good > > > conversion tool and easy to utilize with a huge number of > tools/languages. > > > > JSON is extremely heavy compared to a raw binary format. > > > > It makes sense for low volume, hierarchical structured data, but not > for large tables or arrays of counters. > > > > > Binary output gets into endianness issues and a number of other > problems, > > > so I would not want all of the data exported from DPDK to be in > binary > > > format. > > > > Endianness considerations are only relevant for data exchanged across > the network; not data exchanged across processes inside the same > machine. > > > > And if you are exchanging data across the network, you would usually > implement one or more well known protocols for that, e.g. JSON over > HTTPS, or ASN.1 over SNMP, or InfluxDB over UDP. This means that the > application needs to implement a protocol handler, which - in my > opinion - should handle the relevant data type conversions from the raw > format provided by DPDK. > > > > I think it would be silly for DPDK core libraries to provide counters > in JSON format, so an SNMP Agent would need to convert them from JSON > back to binary and then to ASN.1. > > > > > If the layout of the structure changes then the code would need to > > > know that on both side to be able to convert the data into the > correct > > > values. > > > > I may be exaggerating here, but trying to prove a point: This is what > we have ABI stability for. Structures should be designed cleverly and > future proof, e.g. like the ethdev xstats. Using text based APIs is a > circumvention of ABI stability. > > > > > > > > With that stated, the new telemetry code allows the application to > add new > > > commands and with that you can create a binary set of commands > along side > > > the JSON or any other output format. With the new register command > we can > > > create say a ‘/ethdevraw/stats,X’ set of commands that can emit > binary > > > format. > > > > That would be silly. The protocol handler should make the protocol > specific conversion, not the driver! Again, going to the extreme to > prove a point: If I understand you correctly, this would mean that PMDs > would have provide counters in ASN.1 format for SNMP. > > > > Our application provides a HTTPS/REST based communication interface > for multiple purposes, e.g. getting tables of data. And if you want to > get a table of some data via this interface, you can specify the output > format in the request, so you can get it in e.g. TSV format (tabulator > separated with a headline) for scripts, HTML format for human eyeballs. > This data conversion happens at a common location, so we can easily add > other output formats. You don't want to push this all the way down to > the originator of the data. > > > > > > > > Using this method we get the best of both worlds and when using > languages > > > like Go or Python to collect these stats we have a standard format > for > > > conversion. In Go it is pretty hard to do binary conversion and > JSON > > > conversion is just a few lines. > > > > I don't think DPDK should provide preferential treatment to Go or > Python. DPDK is based on C, and should mainly cater for C. > > > > > JSON may not be the fastest, but if you are > > > requesting stats faster than a second then use the raw commands to > get the > > > data, which anyone can add to its application or we can add them to > DPDK as > > > a standard command set. > > > > APIs in the libraries are currently available to get data in raw > format. My main concern is that libraries in the future will not > provide functions to get raw data, like they do now, but only JSON > formatted data for the telemetry library. This is what I want to avoid. > > > > Hi Morten, > > thanks for all the feedback. > > Firstly on the performance side, we did some basic benchmarking of the > output capabilities of the current proposal. Using a dummy client that > had > two queries constantly outstanding (to avoid dead time between one > response > and next request), we measured the number of replies per second from > the > ethdev xstats call. > * with a 2.3 GHz Xeon system, we got 3,500 responses per second, with > 146 > stats per response, giving a total of >500k stats per second encoded > and > transmitted. > * for those 500k stats, looking at the cpu time on the core doing the > telemetry work, ~80% of CPU time was spent inside the ixgbe driver > getting the stats from the NIC card itself. > * replacing the ixgbe NIC with a ring vdev increased the responses per > second to >100k, though with only 13 stats per response this time, > giving > 1.3M stats per second. > * Splitting the existing xstats call in this RFC into separate xstat > names and xstat values calls (something I think is a good idea to do > generally), and only calling the xstat values call each time, gives > further perf increases to 163k responses per second. > > Overall so, at least on the json generation side, it appears we can do > things rather quickly, though I admit that we did not look into the > parsing > cost on the other side. > > All that being said, however, I do understand your point about having > everything work internally in json - something that ties in with Thomas > concern about having flexibility. Therefore, while we will upstream a > v3 > very shortly with the few incremental comments on v2, we can thereafter > look to switch the internal communication between callbacks and > telemetry > library to use a regular data structure, and then leave the json > encoding > to the library itself just before output. That would: > a) allow the addition of other output types in the future without > needing > new callbacks. > b) allow more flexibility for integrating with a future one-IPC > mechanism > for DPDK. > > I hope this option will resolve most concerns and allow for a 20.05 > integration. > > Regards. > /Bruce
Yes. Thank you. Med venlig hilsen / kind regards - Morten Brørup