On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <j...@redhat.com> wrote:
> Implement basic character sets for the '%[]' conversion specifier. > > The '%[]' conversion specifier matches a nonempty sequence of characters > from the specified set of accepted (or with '^', rejected) characters > between the brackets. The substring matched is to be made up of characters > in (or not in) the set. This implementation differs from its glibc > counterpart in that it does not support character ranges (e.g., 'a-z' or > '0-9'), the hyphen '-' is *not* a special character, and the brackets > themselves cannot be matched. > > Signed-off-by: Jessica Yu <j...@redhat.com> > --- > > This patch adds support for the '%[' conversion specifier for sscanf(). > This is useful in cases where we'd like to match substrings delimited by > something other than spaces. The original motivation for this patch > actually came from a livepatch discussion (See: > https://lkml.org/lkml/2016/2/8/790), > where we were trying to come up with a clean way to parse symbol names with > substrings delimited by periods and commas. It would be better to include the justification right here in the changelog please. Not via some link-to-discussion and definitely not below the ^--- marker! It's very important. The deviation from the glibc behaviour is a bit of a worry, particularly as it is done in a non-back-compat manner: code which assumes "-" is non-magic might break if someone later adds range support. Presumably we can live with that - there won't be many callsites and they can be grepped for. But please, let's get a description of all these considerations into the code as a comment. Probably it would be helpful to include a little usage example in that comment. > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list > args) > num++; > } > continue; > + case '[': > + { > + char *s = (char *)va_arg(args, char *); > + char *set; > + size_t (*op)(const char *str, const char *set); > + size_t len = 0; > + bool negate = (*(fmt) == '^'); > + > + if (field_width == -1) > + field_width = SHRT_MAX; > + > + op = negate ? &strcspn : &strspn; > + if (negate) > + fmt++; > + > + len = strcspn(fmt, "]"); > + /* invalid format; stop here */ > + if (!len) > + return num; > + > + set = kstrndup(fmt, len, GFP_KERNEL); Embedding a GFP_KERNEL allocation into vsscanf is problematic - it limits the situations in which this functionality can be used. afaict the allocation is there merely so we can null-terminate the string so we can use existing library functions (strcspn, strspn). Is that compromise really worth it? We could pretty easily convert strcspn() into strcnspn(const char *s, const char *reject, size_t len) and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN) In fact I think we could still use strspn() and strcspn() on `fmt' directly? We just need to check for the return value exceeding `len' and if so, treat that as a no-match? > + if (!set) > + return num; > + > + /* advance fmt past ']' */ > + fmt += len + 1; > + > + len = op(str, set); > + /* no matches */ > + if (!len) { > + kfree(set); > + return num; > + } > + > + while (len-- && field_width--) > + *s++ = *str++; > + *s = '\0'; > + kfree(set); > + num++; > + } > + continue; > case 'o': > base = 8; > break;