l...@gnu.org (Ludovic Courtès) writes: > This has been addressed in two ways:
No, it hasn't. > 1. In 2.0, (srfi srfi-6) uses Unicode-capable string ports (commit > ecb48dc.) This issue report is not about adding more optional functionality on top. It is about _removing_ unwarranted redirection and complication from existing core functionality. The artifacts of making with-input-from-string and with-output-to-string go through an additional character->bytevector->character encoding/recoding layer are not invisible. > 2. In 2.2, string ports are always Unicode-capable, and > ‘%default-port-encoding’ is ignored (commit 6dce942.) String ports should not be "Unicode capable" but transparent. Characters in, characters out. ftell/fseek should be based on character position in strings rather than offsets in a magically created bytestream of some particular encoding. > So for 2.0, the workaround is to either use (srfi srfi-6), or force > ‘%default-port-encoding’ to "UTF-8". Which is what the latter _only_ does. It still interprets set-port-encoding! with respect to a byte stream meaning, and it still calculates positions according to a byte stream meaning not related to string positions: (use-modules (srfi srfi-6)) (define s (list->string (map integer->char '(20 200 2000 20000)))) (let ((port (open-input-string s))) (let loop ((ch (read-char port))) (if (not (eof-object? ch)) (begin (format #t "~d, pos=~d\n" (char->integer ch) (ftell port)) (loop (read-char port)))))) 20, pos=1 200, pos=3 2000, pos=5 20000, pos=8 Tying string ports to an artificial bytevector presentation in a manner bleeding through like that means that it is not possible to synchronize string positions and stream positions when parts of the source string are _not_ processed from within the stream. Which is precisely the problem I am currently dealing with while porting LilyPond: it has its own lexer working on an (utf-8 encoded) byte stream which is at the same time available as a string port. Whenever embedded Scheme is interpreted, the string port is moved to the proper position, GUILE reads an expression and is told what to do with it, the string port position is picked off and the LilyPond lexer is moved to the respective position to continue. If you take a look at <URL:http://git.savannah.gnu.org/cgit/lilypond.git/tree/scm/parser-ly-from-scheme.scm>, ftell on a string port is here used for correlating the positions of parsed subexpressions with the original data. Reencoding strings in utf-8 is not going to make this work with string indexing since ftell does not bear a useful relation to string positions. The behavior of ftell and port-encoding is perfectly fine for reading from bytevectors or files, and reading from bytevectors or files also does not incur a encode-when-open action governed by %default-port-encoding in GUILE-2.0 and by hardwired UTF-8 in GUILE-2.2. But strings are already decoded characters. Reencoding makes no sense and detaches things like ftell and fseek from the actual input into the port. -- David Kastrup