> Cc: "j...@abou-samra.fr" <j...@abou-samra.fr>, > "r...@defaultvalue.org" <r...@defaultvalue.org>, > "guile-devel@gnu.org" <guile-devel@gnu.org> > From: Maxime Devos <maximede...@telenet.be> > Date: Sun, 7 Jul 2024 16:59:10 +0200 > > >> >> Guile is a Scheme implementation, bound by Scheme standards and > >> >> compatibility > >> >> with other Scheme implementations (and backwards compatibility too). > >> > > >> >Yes, I understand that. > >> > >> Going by what you are saying below, I think you don’t. > > > >Thank you for your vote of confidence. > > That was not a vote of confidence, if anything, it’s the contrary.
You don't say! > > I’m pretty sure that they weren’t intending to get the 0xb5 byte. Rather, > > they were using the equivalent of ‘string-ref’ (i.e., ‘aref’) and > > demonstrating that the result is bogus in Scheme. In Scheme, ‘(string-ref > > ...)’ needs to return a character, and there exists no (Unicode) character > > with codepoint 4194229, so what Emacs returns here would be bogus for > > (Guile) Scheme. > > >aref in Emacs and string-ref in Guile are not the same, and if Guile > needs to produce a raw byte in this scenario, it can be easily > arranged. In Emacs we have other goals. > > It is the opposite. In Guile, string-ref does not need to produce bytes, but > characters – just like aref (modulo difference in how Scheme and Emacs define > ‘byte’). But raw byte is not a character. > >IOW, I think this argument is pointless, since it is easy to adapt the > mechanism to what Guile needs. > > No – the argument is about how it is impossible to adapt the mechanism to > Guile, since bytes aren’t characters in Unicode. I'm saying that Guile needs to support raw bytes as well, because they happen in Real Life, including as part of otherwise legible text. > > >From the Emacs manual: > > > > >For example, you can access individual characters in a string using the > > >function aref (see Functions that Operate on Arrays). > > > > Thus, (aref the-string index) is the equivalent of (string-ref the-string > > index). > > >No, because a raw byte is not a character. > > Yes, because characters are characters. Both string-ref and aref return > characters. This is documented in both the Emacs and Guile manual: > > Again, from the Emacs manual: > > > A string is a fixed sequence of characters. [...] Since strings are arrays, > > and therefore sequences as well, you can operate on them with the general > > array and sequence functions documented in Sequences, Arrays, and Vectors. > > For example, you can access individual characters in a string using the > > function aref (see Functions that Operate on Arrays). > > Hence, (aref the-string index) returns (Emacs) characters. You missed the description of raw bytes and unibyte strings, I guess. > >If Guile restricts itself to Unicode characters and only them, it will > lack important features. So my suggestion is not to have this > restriction. > > Guile restricting strings to Unicode _is_ an important feature (simplicity, > and compatibility). > > Guile extending strings beyond Unicode is a _limitation_ (compatibility and > other trickiness for applications). > > I could imagine in the far future there might be too little codepoints left > in Unicode, in which case the range of what Guile (and more generally, Scheme > and Unicode) considers characters needs to be extended (even if that has some > compatibility implicaitons), but that time hasn’t arrived yet. > > The important feature of this thread, is supporting file names (and getenv > stuff, etc.) that doesn’t fit properly in the ‘string’ model. As mentioned > earlier (in the initial message, even), there are solutions to that do not > impose the ‘let characters go beyond Unicode’ limitation. > > >I think the fact that this discussion is held, and that Rob suggested > to use Latin-1 for the purpose of supporting raw bytes is a clear > indication that Guile, too, needs to deal with "character-like" data > that does not fit the Unicode framework. > > True, and I never claimed otherwise. > > > So I think saying that strings in Guile can only hold Unicode characters > > will not give you what this discussion attempts to give. > > Sure, and I wasn’t trying to. What I (and IIUC, the other person as well) was > doing was mentioning how neither the Emacs’s thing is a solution. (Whether > because of backwards compatibility, or whether because of not _wanting_ to > conflate bytes with characters (and not wanting to go beyond Unicode) with > all the consequences this conflation would imply for applications.) > > > In particular, how will you > handle the situations described by Rob where a file has a name that is > not a valid UTF-8 sequence (thus not "characters" as long as you > interpret text as UTF-8)? > > Scheme does not interpret text as UTF-8, that’s an internal implementation > detail and a matter of things like locales. Instead, to Scheme text is > (Unicode) characters. > > I have outlined a solution (that does not conflate characters with bytes) in > another response. IIRC, it was in a response so Rob. I would propose > actually, you know, reading it. I’m not sure, but IIRC Rob also mentioned > another solution (i.e., just accept bytevectors in some locations, or do > Latin-1). > > Also, this structure makes no sense. Even if I did not provide an alternative > solution of my own, that wouldn’t mean Emacs’s thing is the answer. > (Negative) criticism can be valid without providing alternatives. That's fine by me. I described what we have done in Emacs because I think it works and works well. For many years. So I thought describing it will be useful to Guile and will allow you to consider if something like that could solve your problems, which I think are very similar if not identical. It is up to you whether to reject that solution without trying to adapt it to Guile, and in that case I wish you all the luck in finding your own solutions.