Hello all, GoKhlaYeh on #guile reported that although (system "echo hé") works properly, (system* "/bin/sh" "-c" "echo hé") fails. Upon investigation, I found that although Guile uses the locale encoding almost everywhere when interfacing with C, several POSIX functions use Latin-1 in a few places.
In particular: `system*', `execl', `execlp', `execle', `environ', and `dynamic-args-call' use scm_i_allocate_string_pointers to convert a list of SCM strings into an argv-style array of C strings. That function does a simple memcpy for narrow strings, and throws an exception for wide strings (via scm_i_string_chars). `environ' was particularly broken, in that it would use Latin-1 when setting the environment, and the locale encoding when reading the environment. The `exec' functions would use the locale encoding to encode the program name, and Latin-1 for the arguments and environment. `system*' would use Latin-1 for everything. This patch fixes all of these inconsistencies, by modifying scm_i_allocate_string_pointers to use the locale encoding instead of Latin-1. Any objections to pushing this to stable-2.0? Mark
>From 00691082eb781ce9cad8da156ad170164b2cf5fc Mon Sep 17 00:00:00 2001 From: Mark H Weaver <m...@netris.org> Date: Sun, 1 May 2011 20:12:35 -0400 Subject: [PATCH] Fix several POSIX functions to use the locale encoding * libguile/strings.c (scm_i_allocate_string_pointers): Encode strings using the current locale. Previously, Latin-1 was used. Indirectly, this affects the encoding of strings in `system*', `execl', `execlp', `execle', `environ', and `dynamic-args-call'. (scm_makfromstrs): In header comment, clarify that the C strings are interpreted according to the current locale encoding. * NEWS: Add NEWS entry. --- NEWS | 12 ++++++++++++ libguile/strings.c | 33 +++++++++++++++++++-------------- 2 files changed, 31 insertions(+), 14 deletions(-) diff --git a/NEWS b/NEWS index df6de65..e2495da 100644 --- a/NEWS +++ b/NEWS @@ -4,7 +4,19 @@ See the end for copying conditions. Please send Guile bug reports to bug-gu...@gnu.org. +Changes in 2.0.2 (since 2.0.1): +* Bugs fixed + +** Fixed several POSIX functions to use the locale encoding + +Fixed `system*', `execl', `execlp', `execle', `environ', and +`dynamic-args-call' to encode all strings using the current locale +encoding. Previously, Latin-1 was used to encode strings in several +places, regardless of the locale. + + + Changes in 2.0.1 (since 2.0.0): * Notable changes diff --git a/libguile/strings.c b/libguile/strings.c index bf63704..a70cb8d 100644 --- a/libguile/strings.c +++ b/libguile/strings.c @@ -2052,8 +2052,9 @@ SCM_DEFINE (scm_string_normalize_nfkd, "string-normalize-nfkd", 1, 0, 0, } #undef FUNC_NAME -/* converts C scm_array of strings to SCM scm_list of strings. */ -/* If argc < 0, a null terminated scm_array is assumed. */ +/* converts C scm_array of strings to SCM scm_list of strings. + If argc < 0, a null terminated scm_array is assumed. + The current locale encoding is assumed */ SCM scm_makfromstrs (int argc, char **argv) { @@ -2067,37 +2068,41 @@ scm_makfromstrs (int argc, char **argv) } /* Return a newly allocated array of char pointers to each of the strings - in args, with a terminating NULL pointer. */ - + in args, with a terminating NULL pointer. The strings are encoded using + the current locale. */ char ** scm_i_allocate_string_pointers (SCM list) #define FUNC_NAME "scm_i_allocate_string_pointers" { char **result; - int len = scm_ilength (list); + int list_len = scm_ilength (list); int i; - if (len < 0) + if (list_len < 0) scm_wrong_type_arg_msg (NULL, 0, list, "proper list"); - result = scm_gc_malloc ((len + 1) * sizeof (char *), + result = scm_gc_malloc ((list_len + 1) * sizeof (char *), "string pointers"); - result[len] = NULL; + result[list_len] = NULL; /* The list might be have been modified in another thread, so we check LIST before each access. */ - for (i = 0; i < len && scm_is_pair (list); i++) + for (i = 0; i < list_len && scm_is_pair (list); i++) { - SCM str; + SCM str = SCM_CAR (list); size_t len; + char *c_str = scm_to_locale_stringn (str, &len); - str = SCM_CAR (list); - len = scm_c_string_length (str); - + /* OPTIMIZE-ME: Right now, scm_to_locale_stringn always uses + scm_malloc to allocate the returned string, which must be + explicitly deallocated. This forces us to copy the string a + second time into a new buffer. Ideally there would be variants + of scm_to_*_stringn that can return garbage-collected buffers. */ result[i] = scm_gc_malloc_pointerless (len + 1, "string pointers"); - memcpy (result[i], scm_i_string_chars (str), len); + memcpy (result[i], c_str, len); result[i][len] = '\0'; + free (c_str); list = SCM_CDR (list); } -- 1.7.1