Re: Using libutf8proc for svn_utf_cstring_utf8_width()

Timofei Zhakov Tue, 19 May 2026 13:20:22 -0700

On Tue, May 19, 2026 at 9:42 PM Evgeny Kotkov
<[email protected]> wrote:
>
> Timofei Zhakov <[email protected]> writes:
>
> > Committed the test in r1934405 and the rest of the patch in r1934407.
> > Removed svn_utf__cstring_is_valid() check in r1934408.
>
> Thanks for the change!  I like the idea, and the implementation looks
> good to me overall.
>
> I'm a bit late to the party, but here are a few minor comments and
> suggestions regarding the code:
>
> +  /* Convert the UTF-8 string to UTF-32 (UCS4) which is the format
> +   * utf8proc_charwidth() expects, and get the width of each character.
> +   * We don't need much error checking since the input is valid UTF-8. */
> +  while (*cstr)
>
> After r1934408, the comment stating that "the input is valid UTF-8" seems
> outdated, because we no longer check for valid UTF-8 in advance.


Oh, this's a good catch, thanks. I'll fix the comment.

> +      int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs);
>
> Since `cstr` is a `const char *`, I think this cast unnecessarily discards
> constness.
>
> Also, because utf8proc_iterate() expects a `const utf8proc_uint8_t *`,
> I'd say that casting directly to that target type is safer than using
> `apr_byte_t *`, because the latter could theoretically denote a
> different underlying type.
>
> A couple of similar observations:
>
> - utf8proc_iterate() returns a utf8proc_ssize_t.  Assigning this to
>   `int nbytes` risks truncation in environments where int is smaller
>   than utf8proc_ssize_t:
>
> +      int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs);
>
> - utf8proc_iterate() puts the codepoint result in a utf8proc_int32_t,
>   whereas we currently type it as an apr_int32_t:
>
> +      apr_int32_t ucs;
>
>   While those types are almost certainly compatible, I think that changing
>   it to `utf8proc_int32_t` would guarantee that we use a correct type.
>   It would also ensure that `utf8proc_charwidth(utf8proc_int32_t c)` gets
>   called with an argument of the exactly expected type.

I'm pretty sure I copied this code from another function in the same
file, so I guess this is an API misuse that needs to be fixed in all
those places (svn_utf__fuzzy_escape is the only other place).

-- 
Timofei Zhakov

Re: Using libutf8proc for svn_utf_cstring_utf8_width()

Reply via email to