This seems difficult to test. Any ideas?
From 8f9a8e56cc3e8496fdfed3f889cff9fca19b3663 Mon Sep 17 00:00:00 2001 From: Maxime Devos <[email protected]> Date: Sun, 6 Mar 2022 12:51:33 +0000 Subject: [PATCH 2/2] Deprecate non-functional bind-textdomain-codeset.
TODO: this only deprecated it in the documentation, it needs
to be deprecated elsewhere as well.
* doc/ref/api-i18n.texi (bind-textdomain-codeset): Update documentation.
* doc/ref/guile.texi: Update copyright information.
---
doc/ref/api-i18n.texi | 36 ++++++++++++------------------------
doc/ref/guile.texi | 1 +
2 files changed, 13 insertions(+), 24 deletions(-)
diff --git a/doc/ref/api-i18n.texi b/doc/ref/api-i18n.texi
index 7c49b0a23..c06b75996 100644
--- a/doc/ref/api-i18n.texi
+++ b/doc/ref/api-i18n.texi
@@ -2,6 +2,7 @@
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007,
@c 2009, 2010, 2017 Free Software Foundation, Inc.
+@c Copyright (C) 2022 Maxime Devos
@c See the file guile.texi for copying conditions.
@node Internationalization
@@ -599,33 +600,20 @@ non-standard location.
@deffn {Scheme Procedure} bind-textdomain-codeset domain [encoding]
@deffnx {C Function} scm_bind_textdomain_codeset (domain, encoding)
-Get or set the text encoding to be used by @code{gettext} for messages
-from @var{domain}. @var{encoding} is a string, the name of a coding
-system, for instance @nicode{"8859_1"}. (On a Unix/POSIX system the
-@command{iconv} program can list all available encodings.)
-When called without an @var{encoding} the current setting is returned,
-or @code{#f} if none yet set. When called with an @var{encoding}, it
-is set for @var{domain} and that new setting returned. For example,
+This is a historical procedure, used for getting and setting the text
+encoding used by @code{gettext} for messages from @var{domain},
+preserved for compatibility.
-@example
-(bind-textdomain-codeset "myprog")
-@result{} #f
-(bind-textdomain-codeset "myprog" "latin-9")
-@result{} "latin-9"
-@end example
+This procedure became useless since Guile's string began consisting of
+characters instead of individual bytes, especially since the
+@code{gettext} procedure always used the locale encoding instead
+of the encoding of the text domain.
-The encoding requested can be different from the translated data file,
-messages will be recoded as necessary. But note that when there is no
-translation, @code{gettext} returns its @var{msg} unchanged, ie.@:
-without any recoding. For that reason source message strings are best
-as plain ASCII.
-
-Currently Guile has no understanding of multi-byte characters, and
-string functions won't recognise character boundaries in multi-byte
-strings. An application will at least be able to pass such strings
-through to some output though. Perhaps this will change in the
-future.
+If you use @code{gettext} both in C and Guile code, be aware that Guile
+always assumes the UTF-8 encoding and sets this encoding when Guile's
+@code{bindtextdomain} is called. If the C code expects a different
+encoding, then it needs to operate on a separate domain.
@end deffn
@c Local Variables:
diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
index 660b1ae90..5b56145ca 100644
--- a/doc/ref/guile.texi
+++ b/doc/ref/guile.texi
@@ -15,6 +15,7 @@ This manual documents Guile version @value{VERSION}.
Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation,
Inc.
+Copyright (C) 2022 Maxime Devos
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
--
2.30.2
From 1477c30cdf251863ed8eb3e1f1136262a9814130 Mon Sep 17 00:00:00 2001 From: Maxime Devos <[email protected]> Date: Sun, 6 Mar 2022 12:30:17 +0000 Subject: [PATCH 1/2] Avoid producing ? in locales with too few characters. Previously, if the locale used a character encoding without all characters, then 'gettext' could produce '?' characters. Avoid character encoding concerns by always using UTF-8. * libguile/gettext.c (scm_gettext): Use scm_to_utf8_string and scm_from_utf8_string for msgids. (scm_ngettext): Likewise. (scm_bindtextdomain): Set the character encoding to UTF-8. --- libguile/gettext.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/libguile/gettext.c b/libguile/gettext.c index b9af4d313..bf54def7f 100644 --- a/libguile/gettext.c +++ b/libguile/gettext.c @@ -1,5 +1,7 @@ /* Copyright 2004,2006,2018 Free Software Foundation, Inc. + Copyright 2022 + Maxime Devos This file is part of Guile. @@ -100,7 +102,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0, scm_dynwind_begin (0); - c_msgid = scm_to_locale_string (msgid); + c_msgid = scm_to_utf8_string (msgid); scm_dynwind_free (c_msgid); if (SCM_UNBNDP (domain)) @@ -133,7 +135,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0, if (c_result == c_msgid) result = msgid; else - result = scm_from_locale_string (c_result); + result = scm_from_utf8_string (c_result); scm_dynwind_end (); return result; @@ -158,10 +160,10 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0, scm_dynwind_begin (0); - c_msgid = scm_to_locale_string (msgid); + c_msgid = scm_to_utf8_string (msgid); scm_dynwind_free (c_msgid); - c_msgid_plural = scm_to_locale_string (msgid_plural); + c_msgid_plural = scm_to_utf8_string (msgid_plural); scm_dynwind_free (c_msgid_plural); c_n = scm_to_ulong (n); @@ -199,7 +201,7 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0, else if (c_result == c_msgid_plural) result = msgid_plural; else - result = scm_from_locale_string (c_result); + result = scm_from_utf8_string (c_result); scm_dynwind_end (); return result; @@ -272,6 +274,10 @@ SCM_DEFINE (scm_bindtextdomain, "bindtextdomain", 1, 1, 0, else result = SCM_BOOL_F; + c_result = bind_textdomain_codeset (c_domain, "UTF-8"); + if (c_result == NULL) + SCM_SYSERROR; + scm_dynwind_end (); return result; } base-commit: 24b30130ca75653bdbacea84ce0443608379d630 -- 2.30.2
signature.asc
Description: This is a digitally signed message part
