On Tue, May 19, 2026 at 9:42 PM Evgeny Kotkov <[email protected]> wrote: > > Timofei Zhakov <[email protected]> writes: > > > Committed the test in r1934405 and the rest of the patch in r1934407. > > Removed svn_utf__cstring_is_valid() check in r1934408. > > Thanks for the change! I like the idea, and the implementation looks > good to me overall. > > I'm a bit late to the party, but here are a few minor comments and > suggestions regarding the code: > > + /* Convert the UTF-8 string to UTF-32 (UCS4) which is the format > + * utf8proc_charwidth() expects, and get the width of each character. > + * We don't need much error checking since the input is valid UTF-8. */ > + while (*cstr) > > After r1934408, the comment stating that "the input is valid UTF-8" seems > outdated, because we no longer check for valid UTF-8 in advance.
Oh, this's a good catch, thanks. I'll fix the comment. > + int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs); > > Since `cstr` is a `const char *`, I think this cast unnecessarily discards > constness. > > Also, because utf8proc_iterate() expects a `const utf8proc_uint8_t *`, > I'd say that casting directly to that target type is safer than using > `apr_byte_t *`, because the latter could theoretically denote a > different underlying type. > > A couple of similar observations: > > - utf8proc_iterate() returns a utf8proc_ssize_t. Assigning this to > `int nbytes` risks truncation in environments where int is smaller > than utf8proc_ssize_t: > > + int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs); > > - utf8proc_iterate() puts the codepoint result in a utf8proc_int32_t, > whereas we currently type it as an apr_int32_t: > > + apr_int32_t ucs; > > While those types are almost certainly compatible, I think that changing > it to `utf8proc_int32_t` would guarantee that we use a correct type. > It would also ensure that `utf8proc_charwidth(utf8proc_int32_t c)` gets > called with an argument of the exactly expected type. I'm pretty sure I copied this code from another function in the same file, so I guess this is an API misuse that needs to be fixed in all those places (svn_utf__fuzzy_escape is the only other place). -- Timofei Zhakov

