+CC: Ciara Power, Telemetry library maintainer

> From: fengchengwen [mailto:fengcheng...@huawei.com]
> Sent: Saturday, 18 June 2022 05.52
> 
> On 2022/6/18 1:05, Stephen Hemminger wrote:
> > On Fri, 17 Jun 2022 12:25:04 +0100
> > Bruce Richardson <bruce.richard...@intel.com> wrote:
> >
> >> On Fri, Jun 17, 2022 at 01:16:08PM +0200, Morten Brørup wrote:
> >>>> From: Chengwen Feng [mailto:fengcheng...@huawei.com]
> >>>> Sent: Friday, 17 June 2022 11.46
> >>>>
> >>>> This patch supports escape special characters (including:
> \",\\,/,\b,
> >>>> /f,/n,/r,/t) when telemetry string.
> >>>> This patch is used to support telemetry xxx-dump commands which
> the
> >>>> string may include special characters.
> >>>>
> >>>> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com>
> >>>> ---
> >>>>  lib/telemetry/telemetry.c | 96
> +++++++++++++++++++++++++++++++++++++--
> >>>>  1 file changed, 93 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
> >>>> index c6fd03a5ab..0f762f633e 100644
> >>>> --- a/lib/telemetry/telemetry.c
> >>>> +++ b/lib/telemetry/telemetry.c
> >>>> @@ -215,6 +215,94 @@ container_to_json(const struct rte_tel_data
> *d,
> >>>> char *out_buf, size_t buf_len)
> >>>>          return used;
> >>>>  }
> >>>>
> >>>> +static bool
> >>>> +json_is_special_char(char ch)
> >>>> +{
> >>>> +        static unsigned char is_spec[256] = { 0 };
> >>>> +        static bool init_once;
> >>>> +
> >>>> +        if (!init_once) {
> >>>> +                is_spec['\"'] = 1;
> >>>> +                is_spec['\\'] = 1;
> >>>> +                is_spec['/'] = 1;
> >>>> +                is_spec['\b'] = 1;
> >>>> +                is_spec['\f'] = 1;
> >>>> +                is_spec['\n'] = 1;
> >>>> +                is_spec['\r'] = 1;
> >>>> +                is_spec['\t'] = 1;
> >>>> +                init_once = true;
> >>>> +        }
> >>>> +
> >>>> +        return (bool)is_spec[(unsigned char)ch];
> >>>> +}
> >>
> >> According to the json spec at [1], the characters that need to be
> escaped
> >> are:
> >> a) any characters <0x20
> >> b) inverted commas/quote character \"
> >> c) the "reverse solidus character", better known to you and I as the
> >> back-slash.
> >>
> >> Therefore, I think this table generation could be simplified, but
> also
> >> expanded using this. For completeness we should also see about
> handling all
> >> control characters if they are encountered.
> >>
> >> [1] https://www.rfc-editor.org/rfc/rfc8259.txt
> >>
> >> /Bruce
> >
> > Since it is trivial could be initializer?
> >
> > static const uint8_t is_spec[256] = {
> >    [0 ... 0x20] = 1,
> >    ['\"' ] = 1,
> >    ['\\' ] = 1,
> >    ['/'] = 1,
> >
> > etc
> >
> > Or we could change the telemetry API to disallow control characters?
> 
> I was thinking about converting 0~0x20, but I don't think there's a
> scenario.
> 
> I prefer change the telemetry API to disallow control characters, and
> this may not
> be a violation of the ABI, if yes, the dpdk-telemetry.py will returns
> an error.

I agree with Chengwen Feng. The telemetry data type is STRING, not BLOB.

So we need to define exactly what the STRING type contains.

I hope we can all agree that control characters should be disallowed.

The more complicated question is: Do we want to use the ASCII character set 
only, or do we want to use UTF-8 encoded Unicode?

Personally, think UTF-8 encoded Unicode is more future proof, and would vote 
for that.

But I wouldn't reject limiting it to ASCII, and perhaps in the future introduce 
another data type for UTF-8 strings.

UTF-8 is the modern choice, but it is incompatible with old stuff, e.g. many 
SNMP MIBs.

> 
> So I think we could add declaring and checking functions to make sure
> telemetry string
> do not allow control characters.

Input validation (when storing a string in the telemetry database) has a 
performance cost, so it could be a compile time debug option, like the memory 
cookies and mbuf integrity checks. Just a thought.

Reply via email to