> From: Andy Wingo <[email protected]>
[...]
> The solution is to use functions that specify the locale. We don't have
> those yet, but we do have the capability to write them
> now. Specifically:
>
> scm_from_utf8_string
> scm_from_utf8_symbol
> scm_from_utf8_keyword
>
> scm_from_latin1_string
> scm_from_latin1_symbol
> scm_from_latin1_keyword
>
> We probably also need the "n" variants.
>
[...]
> So then we need, I think:
>
> scm_to_utf8_string
> scm_to_utf16_string
> scm_to_utf32_string
>
> We need the "n" variants here too (perhaps more).
Some of this is already in the bytevectors module, but,
perhaps not in an easy form for C source code.
It would easy enough to do, but, there is a failure case to
consider for scm_from_utf8_string. The C utf8 string could
contain incorrectly encoded data.
You could throw the encoding error, or you could replace the
bad utf8 with U+FFFD or the question mark.
The bytevector's utf8->string always throws encoding-error.
Maybe that's good enough.
Otherwise, perhaps something like
scm_from_utf8_stringn (str, len, error_or_replace_strategy)
If you didn't mind the overhead of calling the somewhat
heavyweight scm_{to,from}_stringn, these could be macros
or inline functions that wrap that.
-Mike