On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote: > Eduardo Habkost <ehabk...@redhat.com> writes: > > > There are lots of duplicate parsing code using strto*() in QEMU, and > > most of that code is broken in one way or another. Even the visitors > > code have duplicate integer parsing code[1]. This introduces functions > > to help parsing unsigned int values: parse_uint() and parse_uint_full(). > > > > Parsing functions for signed ints and floats will be submitted later. > > > > parse_uint_full() has all the checks made by opts_type_uint64() at > > opts-visitor.c: > > > > - Check for NULL (returns -EINVAL) > > - Check for negative numbers (returns -ERANGE) > > - Check for empty string (returns -EINVAL) > > - Check for overflow or other errno values set by strtoll() (returns > > -errno) > > - Check for end of string (reject invalid characters after number) > > (returns -EINVAL) > > > > parse_uint() does everything above except checking for the end of the > > string, so callers can continue parsing the remainder of string after > > the number. > > > > Unit tests included. > > > > [1] string-input-visitor.c:parse_int() could use the same parsing code > > used by opts-visitor.c:opts_type_int(), instead of duplicating that > > logic. > [...] > > diff --git a/util/cutils.c b/util/cutils.c > > index 80bb1dc..7f09251 100644 > > --- a/util/cutils.c > > +++ b/util/cutils.c > > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end) > > return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB); > > } > > > > +/* Try to parse an unsigned integer > > + * > > + * Error checks done by the function: > > + * - NULL pointer will return -EINVAL. > > + * - Empty strings will return -EINVAL. > > Same for strings containing only whitespace.
Oh, you're right. > > > + * - Overflow errors or other errno values set by strtoull() will > > Extra space before 'set'. Oops. > > > + * return -errno (-ERANGE in case of overflow). > > + * - Differently from strtoull(), values starting with a minus sign are > > + * rejected (returning -ERANGE). > > + * > > + * Sets endptr to point to the first invalid character. > > Mention that unlike strtol() & friends, endptr must not be null? > Probably not necessary. I believe it is implicit if I document it as "Sets *endptr". But worth documenting as it is different from strtoull() behavior. > > The two ERANGE cases treat endptr differently: it's either set to point > just behind the out-of-range number, or to point to the beginning of the > out-of-range number. Ugh. This should be fixed in the newer version I sent. > > > + Callers may rely > > + * on *value and *endptr to be always set by the function, even in case of > > + * errors. > > You neglect to specify what gets assigned to *value in error cases. > Undefined. :-) Anyway, I agree it not very useful to document that "*value and *endptr will be always set" if it is undefined. I will try to reword it. > Your list of error checks isn't quite complete. Here's my try: > > /** > * Parse unsigned integer > * > * @s: String to parse > * @value: Destination for parsed integer value > * @endptr: Destination for pointer to first character not consumed > * @base: integer base, between 2 and 36 inclusive, or 0 > * > * Parsed syntax is just like strol()'s: arbitrary whitespace, a single > * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or > * more digits. The newer version has '-' as _not_ part of the syntax. > * > * If @s is null, or @base is invalid, or @s doesn't start with an > * integer in the syntax above, set *@value to 0, *@endptr to @s, and > * return -EINVAL. This matches the behavior of the newer version. But I would like to keep *value documented as undefined in case of errors. > * > * Set @endptr to point right beyond the parsed integer. > * > * If the integer is negative, set *@value to 0, and return -ERANGE. This isn't the case in the newer version. > * If the integer overflows unsigned long long, set *@value to > * ULLONG_MAX, and return -ERANGE. > * Else, set *@value to the parsed integer, and return 0. > */ Thanks for taking the time to write this. I hate having to look up what's the right syntax to use in docstrings, so I often just write plain comments before functions. :-) I have a minor problem with the documentation above: as a developer, the most important question I have is: what's the difference between using parse_uint() and using strtoull() directly? But maybe it is a good thing: instead of referencing the strtoull() specification (and an implementation detail), now we have a well-defined and well-documented behavior. > > > + * > > + * The 'base' parameter has the same meaning of 'base' on strtoull(). > > + * > > + * Returns 0 on success, negative errno value on error. > > + */ > > +int parse_uint(const char *s, unsigned long long *value, char **endptr, > > + int base) > > +{ > > + int r = 0; > > + char *endp = (char *)s; > > + unsigned long long val = 0; > > + > > + if (!s) { > > + r = -EINVAL; > > + goto out; > > + } > > + > > + /* make sure we reject negative numbers: */ > > + while (isspace((unsigned char)*s)) { > > + ++s; endp++; > > Mixing pre- and post-increment like that is ugly. Recommend to stick to > post-increment. Agreed. I blame the copy & paste I made from opts-visitor. Later I added the postfix increment without noticing how ugly it looked like Anyway, this was changed in the newer patch version. > > I'd set endp after the loop. Right before goto. Fixed in the newer version. > > Or simply change the specification to set *endptr to point beyond the > integer instead of to the '-'. I took the liberty to do exactly that in > my proposed function comment. The newer version was changed to return -EINVAL and set endptr to the beginning of the string. > > > + } > > + if (*s == '-') { > > + r = -ERANGE; > > + goto out; > > + } > > This creates the special case that made me go "ugh" above. Suggest to > drop the if here, and check right after strtoull() instead, as shown > below. That way, we get the same endptr behavior for all out-of-range > numbers. Is returning -EINVAL acceptable for you, as well? In your proposal we consider "-1234" part of the language but out-of-range. Returning -EINVAL, we consider "-1234" not part of the language, and invalid input. The newer version returns -EINVAL. > > > + > > + errno = 0; > > + val = strtoull(s, &endp, base); > > if (*s = '-') { > r = -ERANGE; > val = 0; > goto out; > } This has another bug: makes endptr point to the middle of the string in case we are parsing " xxx" or " -1234". > > > + if (errno) { > > + r = -errno; > > + goto out; > > + } > > + > > + if (endp == s) { > > + r = -EINVAL; > > + goto out; > > + } > > + > > +out: > > + *value = val; > > + *endptr = endp; > > + return r; > > +} > > + > > +/* Try to parse an unsigned integer, making sure the whole string is parsed > > + * > > + * Have the same behavior of parse_uint(), but with an additional check > > + * for additional data after the parsed number (in that case, the function > > + * will return -EINVAL). > > What's assigned to *value then? Undefined. :-) > > > + */ > > Alternatively, you could make into parse_uint() do that check when > endptr is null, and drop this function. Matter of taste. I considered doing that, but I believe it would be too subtle. > > > +int parse_uint_full(const char *s, unsigned long long *value, int base) > > +{ > > + char *endp; > > + int r; > > + > > + r = parse_uint(s, value, &endp, base); > > + if (r < 0) { > > + return r; > > + } > > + if (*endp) { > > Answering my own question: the parsed integer is assigned to *value then. I prefer to document it as undefined. If you want to use the parsed integer even if there's additional data after it, you should use parse_int() instead. > > > + return -EINVAL; > > + } > > + > > + return 0; > > +} > > + > > int qemu_parse_fd(const char *param) > > { > > int fd; -- Eduardo