Hi folks, > -----Original Message----- > From: Morten Brørup <m...@smartsharesystems.com> > Sent: Saturday 18 June 2022 10:59 > To: fengchengwen <fengcheng...@huawei.com>; Stephen Hemminger > <step...@networkplumber.org>; Richardson, Bruce > <bruce.richard...@intel.com> > Cc: tho...@monjalon.net; ferruh.yi...@xilinx.com; Laatz, Kevin > <kevin.la...@intel.com>; andrew.rybche...@oktetlabs.ru; > jer...@marvell.com; sachin.sax...@oss.nxp.com; > hemant.agra...@nxp.com; dev@dpdk.org; Power, Ciara > <ciara.po...@intel.com> > Subject: RE: [PATCH v2 1/5] telemetry: escape special char when tel string > > +CC: Ciara Power, Telemetry library maintainer > > > From: fengchengwen [mailto:fengcheng...@huawei.com] > > Sent: Saturday, 18 June 2022 05.52 > > > > On 2022/6/18 1:05, Stephen Hemminger wrote: > > > On Fri, 17 Jun 2022 12:25:04 +0100 > > > Bruce Richardson <bruce.richard...@intel.com> wrote: > > > > > >> On Fri, Jun 17, 2022 at 01:16:08PM +0200, Morten Brørup wrote: > > >>>> From: Chengwen Feng [mailto:fengcheng...@huawei.com] > > >>>> Sent: Friday, 17 June 2022 11.46 > > >>>> > > >>>> This patch supports escape special characters (including: > > \",\\,/,\b, > > >>>> /f,/n,/r,/t) when telemetry string. > > >>>> This patch is used to support telemetry xxx-dump commands which > > the > > >>>> string may include special characters. > > >>>> > > >>>> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com> > > >>>> --- > > >>>> lib/telemetry/telemetry.c | 96 > > +++++++++++++++++++++++++++++++++++++-- > > >>>> 1 file changed, 93 insertions(+), 3 deletions(-) > > >>>> > > >>>> diff --git a/lib/telemetry/telemetry.c > > >>>> b/lib/telemetry/telemetry.c index c6fd03a5ab..0f762f633e 100644 > > >>>> --- a/lib/telemetry/telemetry.c > > >>>> +++ b/lib/telemetry/telemetry.c > > >>>> @@ -215,6 +215,94 @@ container_to_json(const struct rte_tel_data > > *d, > > >>>> char *out_buf, size_t buf_len) > > >>>> return used; > > >>>> } > > >>>> > > >>>> +static bool > > >>>> +json_is_special_char(char ch) > > >>>> +{ > > >>>> + static unsigned char is_spec[256] = { 0 }; > > >>>> + static bool init_once; > > >>>> + > > >>>> + if (!init_once) { > > >>>> + is_spec['\"'] = 1; > > >>>> + is_spec['\\'] = 1; > > >>>> + is_spec['/'] = 1; > > >>>> + is_spec['\b'] = 1; > > >>>> + is_spec['\f'] = 1; > > >>>> + is_spec['\n'] = 1; > > >>>> + is_spec['\r'] = 1; > > >>>> + is_spec['\t'] = 1; > > >>>> + init_once = true; > > >>>> + } > > >>>> + > > >>>> + return (bool)is_spec[(unsigned char)ch]; } > > >> > > >> According to the json spec at [1], the characters that need to be > > escaped > > >> are: > > >> a) any characters <0x20 > > >> b) inverted commas/quote character \" > > >> c) the "reverse solidus character", better known to you and I as > > >> the back-slash. > > >> > > >> Therefore, I think this table generation could be simplified, but > > also > > >> expanded using this. For completeness we should also see about > > handling all > > >> control characters if they are encountered. > > >> > > >> [1] https://www.rfc-editor.org/rfc/rfc8259.txt > > >> > > >> /Bruce > > > > > > Since it is trivial could be initializer? > > > > > > static const uint8_t is_spec[256] = { > > > [0 ... 0x20] = 1, > > > ['\"' ] = 1, > > > ['\\' ] = 1, > > > ['/'] = 1, > > > > > > etc > > > > > > Or we could change the telemetry API to disallow control characters? > > > > I was thinking about converting 0~0x20, but I don't think there's a > > scenario. > > > > I prefer change the telemetry API to disallow control characters, and > > this may not be a violation of the ABI, if yes, the dpdk-telemetry.py > > will returns an error. > > I agree with Chengwen Feng. The telemetry data type is STRING, not BLOB. > > So we need to define exactly what the STRING type contains. > > I hope we can all agree that control characters should be disallowed. > > The more complicated question is: Do we want to use the ASCII character set > only, or do we want to use UTF-8 encoded Unicode? > > Personally, think UTF-8 encoded Unicode is more future proof, and would > vote for that. > > But I wouldn't reject limiting it to ASCII, and perhaps in the future > introduce > another data type for UTF-8 strings. > > UTF-8 is the modern choice, but it is incompatible with old stuff, e.g. many > SNMP MIBs. > [CP]
Just from looking at the spec [1] , I would say UTF-8, as it seems to suggest its use for JSON (section 8.1). [1] https://www.rfc-editor.org/rfc/rfc8259.txt > > > > So I think we could add declaring and checking functions to make sure > > telemetry string do not allow control characters. [CP] I am not sure why we don't want these at all - I thought we do want some of them, like tab (\u0009) for example. <snip> In general, I think Bruce's suggestion of using a customised printf function instead of snprintf would be a good way forward, to scan the chars as they are being copied in. Thanks, Ciara