> Cc: "j...@abou-samra.fr" <j...@abou-samra.fr>, 
>       "r...@defaultvalue.org" <r...@defaultvalue.org>, 
>       "guile-devel@gnu.org" <guile-devel@gnu.org>
> From: Maxime Devos <maximede...@telenet.be>
> Date: Sun, 7 Jul 2024 16:59:10 +0200
> 
> >> >> Guile is a Scheme implementation, bound by Scheme standards and 
> >> >> compatibility
> >> >> with other Scheme implementations (and backwards compatibility too).
> >> >
> >> >Yes, I understand that.
> >> 
> >> Going by what you are saying below, I think you don’t.
> >
> >Thank you for your vote of confidence.
> 
> That was not a vote of confidence, if anything, it’s the contrary.

You don't say!

> > I’m pretty sure that they weren’t intending to get the 0xb5 byte. Rather, 
> > they were using the equivalent of ‘string-ref’ (i.e., ‘aref’) and 
> > demonstrating that the result is bogus in Scheme.  In Scheme, ‘(string-ref 
> > ...)’ needs to return a character, and there exists no (Unicode) character 
> > with codepoint 4194229, so what Emacs returns here would be bogus for 
> > (Guile) Scheme.
> 
> >aref in Emacs and string-ref in Guile are not the same, and if Guile
> needs to produce a raw byte in this scenario, it can be easily
> arranged.  In Emacs we have other goals.
> 
> It is the opposite. In Guile, string-ref does not need to produce bytes, but 
> characters – just like aref (modulo difference in how Scheme and Emacs define 
> ‘byte’).

But raw byte is not a character.

> >IOW, I think this argument is pointless, since it is easy to adapt the
> mechanism to what Guile needs.
> 
> No – the argument is about how it is impossible to adapt the mechanism to 
> Guile, since bytes aren’t characters in Unicode.

I'm saying that Guile needs to support raw bytes as well, because they
happen in Real Life, including as part of otherwise legible text.

> > >From the Emacs manual:
> > 
> > >For example, you can access individual characters in a string using the 
> > >function aref (see Functions that Operate on Arrays).
> > 
> > Thus, (aref the-string index) is the equivalent of (string-ref the-string 
> > index).
> 
> >No, because a raw byte is not a character.
> 
> Yes, because characters are characters. Both string-ref and aref return 
> characters. This is documented in both the Emacs and Guile manual:
> 
> Again, from the Emacs manual:
> 
> > A string is a fixed sequence of characters. [...] Since strings are arrays, 
> > and therefore sequences as well, you can operate on them with the general 
> > array and sequence functions documented in Sequences, Arrays, and Vectors. 
> > For example, you can access individual characters in a string using the 
> > function aref (see Functions that Operate on Arrays).
> 
> Hence, (aref the-string index) returns (Emacs) characters.

You missed the description of raw bytes and unibyte strings, I guess.

> >If Guile restricts itself to Unicode characters and only them, it will
> lack important features.  So my suggestion is not to have this
> restriction.
> 
> Guile restricting strings to Unicode _is_ an important feature (simplicity, 
> and compatibility).
> 
> Guile extending strings beyond Unicode is a _limitation_ (compatibility and 
> other trickiness for applications).
> 
> I could imagine in the far future there might be too little codepoints left 
> in Unicode, in which case the range of what Guile (and more generally, Scheme 
> and Unicode) considers characters needs to be extended (even if that has some 
> compatibility implicaitons), but that time hasn’t arrived yet.
> 
> The important feature of this thread, is supporting file names (and getenv 
> stuff, etc.) that doesn’t fit properly in the ‘string’ model. As mentioned 
> earlier (in the initial message, even), there are solutions to that do not 
> impose the ‘let characters go beyond Unicode’ limitation.
> 
> >I think the fact that this discussion is held, and that Rob suggested
> to use Latin-1 for the purpose of supporting raw bytes is a clear
> indication that Guile, too, needs to deal with "character-like" data
> that does not fit the Unicode framework. 
> 
> True, and I never claimed otherwise.
> 
> > So I think saying that strings in Guile can only hold Unicode characters 
> > will not give you what this discussion attempts to give.
> 
> Sure, and I wasn’t trying to. What I (and IIUC, the other person as well) was 
> doing was mentioning how neither the Emacs’s thing is a solution. (Whether 
> because of backwards compatibility, or whether because of not _wanting_ to 
> conflate bytes with characters (and not wanting to go beyond Unicode) with 
> all the consequences this conflation would imply for applications.)
> 
> > In particular, how will you
> handle the situations described by Rob where a file has a name that is
> not a valid UTF-8 sequence (thus not "characters" as long as you
> interpret text as UTF-8)?
> 
> Scheme does not interpret text as UTF-8, that’s an internal implementation 
> detail and a matter of things like locales. Instead, to Scheme text is 
> (Unicode) characters.
> 
> I have outlined a solution (that does not conflate characters with bytes) in 
> another response. IIRC, it was in a response so Rob. I would propose 
> actually, you know, reading it. I’m not sure, but IIRC Rob also mentioned 
> another solution (i.e., just accept bytevectors in some locations, or do 
> Latin-1).
> 
> Also, this structure makes no sense. Even if I did not provide an alternative 
> solution of my own, that wouldn’t mean Emacs’s thing is the answer. 
> (Negative) criticism can be valid without providing alternatives.

That's fine by me.  I described what we have done in Emacs because I
think it works and works well.  For many years.  So I thought
describing it will be useful to Guile and will allow you to consider
if something like that could solve your problems, which I think are
very similar if not identical.  It is up to you whether to reject that
solution without trying to adapt it to Guile, and in that case I wish
you all the luck in finding your own solutions.

Reply via email to