Hi :)

On Mon 27 Feb 2017 17:07, Eli Zaretskii <e...@gnu.org> writes:

>> From: Andy Wingo <wi...@pobox.com>
>> Date: Sun, 26 Feb 2017 22:20:31 +0100
>> 
>> In Scheme, strings are sequences of characters.  Encoding and decoding
>> is only needed when going to and from bytes.  Guile supports a finite
>> number of encodings, so in general some encoding/decoding will always be
>> needed.  The specific encoding may change over time.
>
> The lesson of Emacs development is that there's a need for
> "characters" that represent raw bytes which cannot be decoded into the
> internal representation, for whatever reasons.  These special
> "characters" need to be representable in strings, among "normal"
> recognizable characters (and thus distinguishable from the latter
> kind), and they need to be converted back to their single-byte form
> when the string is output to the external world.  An implementation of
> text that doesn't include these features will always fail to support
> some important use cases.

Thanks for this note (and upthread).  I didn't know Emacs settled on
this strategy.  It could fit in as a new "conversion strategy" (see
Encoding in the manual).

I think this feature will probably slip for 2.2.0 for lack of time,
though.  When someone does go to look at it, this thread is a useful
resource, or parts of it anyway :) I especially appreciated the
tradeoffs between surrogates and strange UTF-8 hacks.

Andy

Reply via email to