Christian Franke wrote:
Corinna Vinschen wrote:
On Jun 29 19:13, Christian Franke wrote:
Fixes the CESU-8 value, but not the missing encoding if the high
surrogate
is at the very end of the string.
Are you going to provide a patch for that issue?
Not very soon as this possibly requires non-trivial rework including
comprehensive testing.
The function behind __WCTOMB() must also be called with the final
L'\0' as input. This is not the case. For example in _wcstombs_r()
only the second __WCTOMB() is called with L'\0'. The (s == NULL) part
implicitly assumes that it would only append '\0' and return 1.
newlib/libc/stdlib/wctomb_r.c:
size_t
_wcstombs_r (...)
{
...
if (s == NULL)
{
...
while (*pwcs != 0)
{
bytes = __WCTOMB (r, buff, *pwcs++, state);
...
num_bytes += bytes;
}
return num_bytes;
}
else
{
while (n > 0)
{
bytes = __WCTOMB (r, buff, *pwcs, state);
...
if (*pwcs == 0x00)
return ptr - s - (n >= bytes);
...
}
...
}
}
Proposed fix for the above function only:
https://sourceware.org/pipermail/newlib/2025/021937.html
Unfortunately my first try to fix Cygwin's own sys_wcstombs() had not
the desired effect:
--- b/winsup/cygwin/strfuncs.cc
+++ a/winsup/cygwin/strfuncs.cc
@@ -1012,9 +1012,14 @@ _sys_wcstombs (char *dst, size_t len, const
wchar_t *src, size_t nwc,
for (int i = 0; i < bytes; ++i)
*ptr++ = buf[i];
}
- if (*pwcs++ == 0x00)
- break;
n += bytes;
+ if (*pwcs++ == 0x00)
+ {
+ /* n is the size without trailing NUL. */
+ if (n > 0)
+ --n;
+ break;
+ }
--
Regards,
Christian