This seems difficult to test. Any ideas?
From 8f9a8e56cc3e8496fdfed3f889cff9fca19b3663 Mon Sep 17 00:00:00 2001 From: Maxime Devos <maximede...@telenet.be> Date: Sun, 6 Mar 2022 12:51:33 +0000 Subject: [PATCH 2/2] Deprecate non-functional bind-textdomain-codeset.
TODO: this only deprecated it in the documentation, it needs to be deprecated elsewhere as well. * doc/ref/api-i18n.texi (bind-textdomain-codeset): Update documentation. * doc/ref/guile.texi: Update copyright information. --- doc/ref/api-i18n.texi | 36 ++++++++++++------------------------ doc/ref/guile.texi | 1 + 2 files changed, 13 insertions(+), 24 deletions(-) diff --git a/doc/ref/api-i18n.texi b/doc/ref/api-i18n.texi index 7c49b0a23..c06b75996 100644 --- a/doc/ref/api-i18n.texi +++ b/doc/ref/api-i18n.texi @@ -2,6 +2,7 @@ @c This is part of the GNU Guile Reference Manual. @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007, @c 2009, 2010, 2017 Free Software Foundation, Inc. +@c Copyright (C) 2022 Maxime Devos @c See the file guile.texi for copying conditions. @node Internationalization @@ -599,33 +600,20 @@ non-standard location. @deffn {Scheme Procedure} bind-textdomain-codeset domain [encoding] @deffnx {C Function} scm_bind_textdomain_codeset (domain, encoding) -Get or set the text encoding to be used by @code{gettext} for messages -from @var{domain}. @var{encoding} is a string, the name of a coding -system, for instance @nicode{"8859_1"}. (On a Unix/POSIX system the -@command{iconv} program can list all available encodings.) -When called without an @var{encoding} the current setting is returned, -or @code{#f} if none yet set. When called with an @var{encoding}, it -is set for @var{domain} and that new setting returned. For example, +This is a historical procedure, used for getting and setting the text +encoding used by @code{gettext} for messages from @var{domain}, +preserved for compatibility. -@example -(bind-textdomain-codeset "myprog") -@result{} #f -(bind-textdomain-codeset "myprog" "latin-9") -@result{} "latin-9" -@end example +This procedure became useless since Guile's string began consisting of +characters instead of individual bytes, especially since the +@code{gettext} procedure always used the locale encoding instead +of the encoding of the text domain. -The encoding requested can be different from the translated data file, -messages will be recoded as necessary. But note that when there is no -translation, @code{gettext} returns its @var{msg} unchanged, ie.@: -without any recoding. For that reason source message strings are best -as plain ASCII. - -Currently Guile has no understanding of multi-byte characters, and -string functions won't recognise character boundaries in multi-byte -strings. An application will at least be able to pass such strings -through to some output though. Perhaps this will change in the -future. +If you use @code{gettext} both in C and Guile code, be aware that Guile +always assumes the UTF-8 encoding and sets this encoding when Guile's +@code{bindtextdomain} is called. If the C code expects a different +encoding, then it needs to operate on a separate domain. @end deffn @c Local Variables: diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi index 660b1ae90..5b56145ca 100644 --- a/doc/ref/guile.texi +++ b/doc/ref/guile.texi @@ -15,6 +15,7 @@ This manual documents Guile version @value{VERSION}. Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation, Inc. +Copyright (C) 2022 Maxime Devos Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or -- 2.30.2
From 1477c30cdf251863ed8eb3e1f1136262a9814130 Mon Sep 17 00:00:00 2001 From: Maxime Devos <maximede...@telenet.be> Date: Sun, 6 Mar 2022 12:30:17 +0000 Subject: [PATCH 1/2] Avoid producing ? in locales with too few characters. Previously, if the locale used a character encoding without all characters, then 'gettext' could produce '?' characters. Avoid character encoding concerns by always using UTF-8. * libguile/gettext.c (scm_gettext): Use scm_to_utf8_string and scm_from_utf8_string for msgids. (scm_ngettext): Likewise. (scm_bindtextdomain): Set the character encoding to UTF-8. --- libguile/gettext.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/libguile/gettext.c b/libguile/gettext.c index b9af4d313..bf54def7f 100644 --- a/libguile/gettext.c +++ b/libguile/gettext.c @@ -1,5 +1,7 @@ /* Copyright 2004,2006,2018 Free Software Foundation, Inc. + Copyright 2022 + Maxime Devos This file is part of Guile. @@ -100,7 +102,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0, scm_dynwind_begin (0); - c_msgid = scm_to_locale_string (msgid); + c_msgid = scm_to_utf8_string (msgid); scm_dynwind_free (c_msgid); if (SCM_UNBNDP (domain)) @@ -133,7 +135,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0, if (c_result == c_msgid) result = msgid; else - result = scm_from_locale_string (c_result); + result = scm_from_utf8_string (c_result); scm_dynwind_end (); return result; @@ -158,10 +160,10 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0, scm_dynwind_begin (0); - c_msgid = scm_to_locale_string (msgid); + c_msgid = scm_to_utf8_string (msgid); scm_dynwind_free (c_msgid); - c_msgid_plural = scm_to_locale_string (msgid_plural); + c_msgid_plural = scm_to_utf8_string (msgid_plural); scm_dynwind_free (c_msgid_plural); c_n = scm_to_ulong (n); @@ -199,7 +201,7 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0, else if (c_result == c_msgid_plural) result = msgid_plural; else - result = scm_from_locale_string (c_result); + result = scm_from_utf8_string (c_result); scm_dynwind_end (); return result; @@ -272,6 +274,10 @@ SCM_DEFINE (scm_bindtextdomain, "bindtextdomain", 1, 1, 0, else result = SCM_BOOL_F; + c_result = bind_textdomain_codeset (c_domain, "UTF-8"); + if (c_result == NULL) + SCM_SYSERROR; + scm_dynwind_end (); return result; } base-commit: 24b30130ca75653bdbacea84ce0443608379d630 -- 2.30.2
signature.asc
Description: This is a digitally signed message part