Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions

Eduardo Habkost Fri, 18 Jan 2013 05:25:43 -0800

On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote:
> Eduardo Habkost <ehabk...@redhat.com> writes:
> 
> > There are lots of duplicate parsing code using strto*() in QEMU, and
> > most of that code is broken in one way or another. Even the visitors
> > code have duplicate integer parsing code[1]. This introduces functions
> > to help parsing unsigned int values: parse_uint() and parse_uint_full().
> >
> > Parsing functions for signed ints and floats will be submitted later.
> >
> > parse_uint_full() has all the checks made by opts_type_uint64() at
> > opts-visitor.c:
> >
> >  - Check for NULL (returns -EINVAL)
> >  - Check for negative numbers (returns -ERANGE)
> >  - Check for empty string (returns -EINVAL)
> >  - Check for overflow or other errno values set by strtoll() (returns
> >    -errno)
> >  - Check for end of string (reject invalid characters after number)
> >    (returns -EINVAL)
> >
> > parse_uint() does everything above except checking for the end of the
> > string, so callers can continue parsing the remainder of string after
> > the number.
> >
> > Unit tests included.
> >
> > [1] string-input-visitor.c:parse_int() could use the same parsing code
> >     used by opts-visitor.c:opts_type_int(), instead of duplicating that
> >     logic.
> [...]
> > diff --git a/util/cutils.c b/util/cutils.c
> > index 80bb1dc..7f09251 100644
> > --- a/util/cutils.c
> > +++ b/util/cutils.c
> > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end)
> >      return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
> >  }
> >  
> > +/* Try to parse an unsigned integer
> > + *
> > + * Error checks done by the function:
> > + * - NULL pointer will return -EINVAL.
> > + * - Empty strings will return -EINVAL.
> 
> Same for strings containing only whitespace.


Oh, you're right.

> 
> > + * - Overflow errors or other errno values  set by strtoull() will
> 
> Extra space before 'set'.

Oops.

> 
> > + *   return -errno (-ERANGE in case of overflow).
> > + * - Differently from strtoull(), values starting with a minus sign are
> > + *   rejected (returning -ERANGE).
> > + *
> > + * Sets endptr to point to the first invalid character.
> 
> Mention that unlike strtol() & friends, endptr must not be null?
> Probably not necessary.

I believe it is implicit if I document it as "Sets *endptr". But worth
documenting as it is different from strtoull() behavior.

> 
> The two ERANGE cases treat endptr differently: it's either set to point
> just behind the out-of-range number, or to point to the beginning of the
> out-of-range number.  Ugh.

This should be fixed in the newer version I sent.

> 
> > +                                                        Callers may rely
> > + * on *value and *endptr to be always set by the function, even in case of
> > + * errors.
> 
> You neglect to specify what gets assigned to *value in error cases.
> 

Undefined.  :-)

Anyway, I agree it not very useful to document that "*value and *endptr
will be always set" if it is undefined. I will try to reword it.


> Your list of error checks isn't quite complete.  Here's my try:
> 
> /**
>  * Parse unsigned integer
>  *
>  * @s: String to parse
>  * @value: Destination for parsed integer value
>  * @endptr: Destination for pointer to first character not consumed
>  * @base: integer base, between 2 and 36 inclusive, or 0
>  *
>  * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
>  * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
>  * more digits.

The newer version has '-' as _not_ part of the syntax.

>  *
>  * If @s is null, or @base is invalid, or @s doesn't start with an
>  * integer in the syntax above, set *@value to 0, *@endptr to @s, and
>  * return -EINVAL.

This matches the behavior of the newer version. But I would like to keep
*value documented as undefined in case of errors.

>  *
>  * Set @endptr to point right beyond the parsed integer.
>  *
>  * If the integer is negative, set *@value to 0, and return -ERANGE.

This isn't the case in the newer version.

>  * If the integer overflows unsigned long long, set *@value to
>  * ULLONG_MAX, and return -ERANGE.
>  * Else, set *@value to the parsed integer, and return 0.
>  */

Thanks for taking the time to write this. I hate having to look up
what's the right syntax to use in docstrings, so I often just write
plain comments before functions.  :-)

I have a minor problem with the documentation above: as a developer, the
most important question I have is: what's the difference between using
parse_uint() and using strtoull() directly? But maybe it is a good
thing: instead of referencing the strtoull() specification (and an
implementation detail), now we have a well-defined and well-documented
behavior.

> 
> > + *
> > + * The 'base' parameter has the same meaning of 'base' on strtoull().
> > + *
> > + * Returns 0 on success, negative errno value on error.
> > + */
> > +int parse_uint(const char *s, unsigned long long *value, char **endptr,
> > +               int base)
> > +{
> > +    int r = 0;
> > +    char *endp = (char *)s;
> > +    unsigned long long val = 0;
> > +
> > +    if (!s) {
> > +        r = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    /* make sure we reject negative numbers: */
> > +    while (isspace((unsigned char)*s)) {
> > +        ++s; endp++;
> 
> Mixing pre- and post-increment like that is ugly.  Recommend to stick to
> post-increment.

Agreed. I blame the copy & paste I made from opts-visitor. Later I added
the postfix increment without noticing how ugly it looked like

Anyway, this was changed in the newer patch version.

> 
> I'd set endp after the loop.  Right before goto.

Fixed in the newer version.

> 
> Or simply change the specification to set *endptr to point beyond the
> integer instead of to the '-'.  I took the liberty to do exactly that in
> my proposed function comment.

The newer version was changed to return -EINVAL and set endptr to the
beginning of the string.

> 
> > +    }
> > +    if (*s == '-') {
> > +        r = -ERANGE;
> > +        goto out;
> > +    }
> 
> This creates the special case that made me go "ugh" above.  Suggest to
> drop the if here, and check right after strtoull() instead, as shown
> below.  That way, we get the same endptr behavior for all out-of-range
> numbers.

Is returning -EINVAL acceptable for you, as well? In your proposal we
consider "-1234" part of the language but out-of-range. Returning
-EINVAL, we consider "-1234" not part of the language, and invalid
input. The newer version returns -EINVAL.

> 
> > +
> > +    errno = 0;
> > +    val = strtoull(s, &endp, base);
> 
>        if (*s = '-') {
>            r = -ERANGE;
>            val = 0;
>            goto out;
>        }

This has another bug: makes endptr point to the middle of the string in
case we are parsing "   xxx" or "   -1234".

> 
> > +    if (errno) {
> > +        r = -errno;
> > +        goto out;
> > +    }
> > +
> > +    if (endp == s) {
> > +        r = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +out:
> > +    *value = val;
> > +    *endptr = endp;
> > +    return r;
> > +}
> > +
> > +/* Try to parse an unsigned integer, making sure the whole string is parsed
> > + *
> > + * Have the same behavior of parse_uint(), but with an additional check
> > + * for additional data after the parsed number (in that case, the function
> > + * will return -EINVAL).
> 
> What's assigned to *value then?

Undefined.  :-)


> 
> > + */
> 
> Alternatively, you could make into parse_uint() do that check when
> endptr is null, and drop this function.  Matter of taste.

I considered doing that, but I believe it would be too subtle.

> 
> > +int parse_uint_full(const char *s, unsigned long long *value, int base)
> > +{
> > +    char *endp;
> > +    int r;
> > +
> > +    r = parse_uint(s, value, &endp, base);
> > +    if (r < 0) {
> > +        return r;
> > +    }
> > +    if (*endp) {
> 
> Answering my own question: the parsed integer is assigned to *value then.

I prefer to document it as undefined. If you want to use the parsed
integer even if there's additional data after it, you should use
parse_int() instead.

> 
> > +        return -EINVAL;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  int qemu_parse_fd(const char *param)
> >  {
> >      int fd;

-- 
Eduardo

Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions

Reply via email to