Eduardo Habkost <ehabk...@redhat.com> writes: > There are lots of duplicate parsing code using strto*() in QEMU, and > most of that code is broken in one way or another. Even the visitors > code have duplicate integer parsing code[1]. This introduces functions > to help parsing unsigned int values: parse_uint() and parse_uint_full(). > > Parsing functions for signed ints and floats will be submitted later. > > parse_uint_full() has all the checks made by opts_type_uint64() at > opts-visitor.c: > > - Check for NULL (returns -EINVAL) > - Check for negative numbers (returns -ERANGE) > - Check for empty string (returns -EINVAL) > - Check for overflow or other errno values set by strtoll() (returns > -errno) > - Check for end of string (reject invalid characters after number) > (returns -EINVAL) > > parse_uint() does everything above except checking for the end of the > string, so callers can continue parsing the remainder of string after > the number. > > Unit tests included. > > [1] string-input-visitor.c:parse_int() could use the same parsing code > used by opts-visitor.c:opts_type_int(), instead of duplicating that > logic. [...] > diff --git a/util/cutils.c b/util/cutils.c > index 80bb1dc..7f09251 100644 > --- a/util/cutils.c > +++ b/util/cutils.c > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end) > return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB); > } > > +/* Try to parse an unsigned integer > + * > + * Error checks done by the function: > + * - NULL pointer will return -EINVAL. > + * - Empty strings will return -EINVAL.
Same for strings containing only whitespace. > + * - Overflow errors or other errno values set by strtoull() will Extra space before 'set'. > + * return -errno (-ERANGE in case of overflow). > + * - Differently from strtoull(), values starting with a minus sign are > + * rejected (returning -ERANGE). > + * > + * Sets endptr to point to the first invalid character. Mention that unlike strtol() & friends, endptr must not be null? Probably not necessary. The two ERANGE cases treat endptr differently: it's either set to point just behind the out-of-range number, or to point to the beginning of the out-of-range number. Ugh. > + Callers may rely > + * on *value and *endptr to be always set by the function, even in case of > + * errors. You neglect to specify what gets assigned to *value in error cases. Your list of error checks isn't quite complete. Here's my try: /** * Parse unsigned integer * * @s: String to parse * @value: Destination for parsed integer value * @endptr: Destination for pointer to first character not consumed * @base: integer base, between 2 and 36 inclusive, or 0 * * Parsed syntax is just like strol()'s: arbitrary whitespace, a single * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or * more digits. * * If @s is null, or @base is invalid, or @s doesn't start with an * integer in the syntax above, set *@value to 0, *@endptr to @s, and * return -EINVAL. * * Set @endptr to point right beyond the parsed integer. * * If the integer is negative, set *@value to 0, and return -ERANGE. * If the integer overflows unsigned long long, set *@value to * ULLONG_MAX, and return -ERANGE. * Else, set *@value to the parsed integer, and return 0. */ > + * > + * The 'base' parameter has the same meaning of 'base' on strtoull(). > + * > + * Returns 0 on success, negative errno value on error. > + */ > +int parse_uint(const char *s, unsigned long long *value, char **endptr, > + int base) > +{ > + int r = 0; > + char *endp = (char *)s; > + unsigned long long val = 0; > + > + if (!s) { > + r = -EINVAL; > + goto out; > + } > + > + /* make sure we reject negative numbers: */ > + while (isspace((unsigned char)*s)) { > + ++s; endp++; Mixing pre- and post-increment like that is ugly. Recommend to stick to post-increment. I'd set endp after the loop. Right before goto. Or simply change the specification to set *endptr to point beyond the integer instead of to the '-'. I took the liberty to do exactly that in my proposed function comment. > + } > + if (*s == '-') { > + r = -ERANGE; > + goto out; > + } This creates the special case that made me go "ugh" above. Suggest to drop the if here, and check right after strtoull() instead, as shown below. That way, we get the same endptr behavior for all out-of-range numbers. > + > + errno = 0; > + val = strtoull(s, &endp, base); if (*s = '-') { r = -ERANGE; val = 0; goto out; } > + if (errno) { > + r = -errno; > + goto out; > + } > + > + if (endp == s) { > + r = -EINVAL; > + goto out; > + } > + > +out: > + *value = val; > + *endptr = endp; > + return r; > +} > + > +/* Try to parse an unsigned integer, making sure the whole string is parsed > + * > + * Have the same behavior of parse_uint(), but with an additional check > + * for additional data after the parsed number (in that case, the function > + * will return -EINVAL). What's assigned to *value then? > + */ Alternatively, you could make into parse_uint() do that check when endptr is null, and drop this function. Matter of taste. > +int parse_uint_full(const char *s, unsigned long long *value, int base) > +{ > + char *endp; > + int r; > + > + r = parse_uint(s, value, &endp, base); > + if (r < 0) { > + return r; > + } > + if (*endp) { Answering my own question: the parsed integer is assigned to *value then. > + return -EINVAL; > + } > + > + return 0; > +} > + > int qemu_parse_fd(const char *param) > { > int fd;