Christian Franke wrote:
Corinna Vinschen wrote:
On Jun 29 19:13, Christian Franke wrote:
Fixes the CESU-8 value, but not the missing encoding if the high surrogate
is at the very end of the string.
Are you going to provide a patch for that issue?

Not very soon as this possibly requires non-trivial rework including comprehensive testing.

The function behind __WCTOMB() must also be called with the final L'\0' as input. This is not the case. For example in _wcstombs_r() only the second __WCTOMB() is called with L'\0'. The (s == NULL) part implicitly assumes that it would only append '\0' and return 1.

newlib/libc/stdlib/wctomb_r.c:

size_t
_wcstombs_r (...)
{
  ...
  if (s == NULL)
    {
      ...
      while (*pwcs != 0)
        {
          bytes = __WCTOMB (r, buff, *pwcs++, state);
          ...
          num_bytes += bytes;
        }
        return num_bytes;
    }
  else
    {
      while (n > 0)
        {
          bytes = __WCTOMB (r, buff, *pwcs, state);
          ...
          if (*pwcs == 0x00)
            return ptr - s - (n >= bytes);
          ...
        }
        ...
    }
}


Proposed fix for the above function only: https://sourceware.org/pipermail/newlib/2025/021937.html

Unfortunately my first try to fix Cygwin's own sys_wcstombs() had not the desired effect:

--- b/winsup/cygwin/strfuncs.cc
+++ a/winsup/cygwin/strfuncs.cc
@@ -1012,9 +1012,14 @@ _sys_wcstombs (char *dst, size_t len, const wchar_t *src, size_t nwc,
              for (int i = 0; i < bytes; ++i)
                *ptr++ = buf[i];
            }
-         if (*pwcs++ == 0x00)
-           break;
          n += bytes;
+         if (*pwcs++ == 0x00)
+           {
+             /* n is the size without trailing NUL. */
+             if (n > 0)
+               --n;
+             break;
+           }


--
Regards,
Christian

Reply via email to