Re: [PATCH] wcrtomb: fix CESU-8 value of leftover lone high surrogate

Corinna Vinschen Mon, 30 Jun 2025 03:27:51 -0700

On Jun 29 19:13, Christian Franke wrote:
> Fixes the CESU-8 value, but not the missing encoding if the high surrogate
> is at the very end of the string.


Are you going to provide a patch for that issue?
> 
> -- 
> Regards,
> Christian
> 

> From 96f23496f249558949923e60270b9568956912bf Mon Sep 17 00:00:00 2001
> From: Christian Franke <[email protected]>
> Date: Sun, 29 Jun 2025 19:03:36 +0200
> Subject: [PATCH] wcrtomb: fix CESU-8 value of leftover lone high surrogate
> 
> Addresses: https://cygwin.com/pipermail/cygwin/2025-June/258378.html
> Fixes: 6ff28fc3b121 ("Allow CESU-8 surrogate value encoding")
> Signed-off-by: Christian Franke <[email protected]>
> ---
>  newlib/libc/stdlib/wctomb_r.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/newlib/libc/stdlib/wctomb_r.c b/newlib/libc/stdlib/wctomb_r.c
> index 5ea1e13e4..ec6adfa49 100644
> --- a/newlib/libc/stdlib/wctomb_r.c
> +++ b/newlib/libc/stdlib/wctomb_r.c
> @@ -62,8 +62,8 @@ __utf8_wctomb (struct _reent *r,
>        of the surrogate and proceed to convert the given character.  Note
>        to return extra 3 bytes. */
>        wchar_t tmp;
> -      tmp = (state->__value.__wchb[0] << 16 | state->__value.__wchb[1] << 8)
> -         - (0x10000 >> 10 | 0xd80d);

What a weird typo.  I wonder how I fat-fingered that 'd' into the code
/*facepalm*/

> +      tmp = (((state->__value.__wchb[0] << 16 | state->__value.__wchb[1] << 
> 8)
> +         - 0x10000) >> 10) | 0xd800;
>        *s++ = 0xe0 | ((tmp & 0xf000) >> 12);
>        *s++ = 0x80 | ((tmp &  0xfc0) >> 6);
>        *s++ = 0x80 |  (tmp &   0x3f);
> -- 
> 2.45.1
> 

LGTM, please push.

Thanks,
Corinna

Re: [PATCH] wcrtomb: fix CESU-8 value of leftover lone high surrogate

Reply via email to