Antonio Ospite <a...@ao2.it> writes: > AFAICS in guile-2.0 the difference between characters and bytes is > taken very seriously.
The problem is that lily/parser.yy and particularly lily/lexer.ll implement robust and fast recognition and interpretation of UTF-8. It transparently maps them to C++ strings encoded in UTF-8. Guile-2.0 has _no_ UTF-8 encoded strings. Its strings are _either_ encoded in Latin-1 or in UCS-32. Its string _ports_ are exclusively encoded in UTF-8 and that also includes any file offsets in the string ports. As a result, its string port offsets are _useless_ for indexing into strings. If you want to get an UTF-8 string into Guile, it will get decoded into UCS-32 only to be reencoded into UTF-8 when moved through a string port (like when using the Scheme reader on it) and have each character be redecoded into UCS-32 that will get reencoded into UTF-8 when getting it back into C++. Guile-2.0 cannot work efficiently with string ports internally since it constantly needs to recode stuff. Its UTF-8 encoding/decoding (unlike that of Emacs) cannot represent anything not in proper UTF-8: it either produces stuff that does not encode into the original, or errors out without remedy and useful offsets. As a consequence, pinpointing the problem into the original string or byte sequence is unreliable. The UTF-8 libraries Guile employs are not internal to Guile (though partly distributed as part of Guile rather than an external dependency). Very little active work on them has been done in recent years. The Guile developers will be in total denial that anything is amiss with the current situation and that there is anything wrong with the inability of Guile to read and write UTF-8 strings without involving a non-information preserving conversion to UCS-32 or Latin-1 and back and having its string ports work in an encoding that its strings cannot represent. LilyPond uses Guile as a very tightly integrated extension language so it constantly passes strings into Guile and back and reads from string ports. Actual byte streams seem like they could help keeping some of this insanity in check, in particular if you can let the Scheme reader treat them as if they were in UTF-8. Now in Guile-1.8, we did a lot of the UTF-8 work seamlessly and manually. There are a few rough corners with that in the context of Scheme identifiers and strings. Doing stuff "the Guile way" instead will be good for a lot of headaches since Guile's representations are not even compatible within Guile itself and since any attempt of getting strings into and out of Guile requires a conversion since Guile's internal encodings are not exposed to its API. -- David Kastrup _______________________________________________ lilypond-devel mailing list lilypond-devel@gnu.org https://lists.gnu.org/mailman/listinfo/lilypond-devel