bug#22901: drain-input doesn't decode
Closing this since it's 5 years old and fixed in Guile 2.1 and higher. -- Taylan
bug#22901: drain-input doesn't decode
Are we still maintaining 2.0, or can this issue be closed? -- Taylan
bug#22901: drain-input doesn't decode
> On Feb 26, 2017, at 9:46 AM, Matt Wette wrote: > > I put together a test and tried on 2.1.7 - my test fails. See attached. > > (pass-if "encoded input" >(let ((fn (test-file)) > (nc "utf-8") > (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.") > ;;(st "hello, world\n") > ) > (let ((p1 (open-output-file fn #:encoding nc))) > ;;(display st p1) > (string-for-each (lambda (ch) (write-char ch p1)) st) > (close p1)) > (let* ((p0 (open-input-file fn #:encoding nc)) >(s0 (begin (unread-char (read-char p0) p0) (drain-input p0 > (simple-format #t "~S\n" s0) > (equal? s0 st > My bad. The failure was on guile-2.0.13. It seems to work on guile-2.1.7: mwette$ guile-2.1.7-dev3/meta/guile port-di.test "βαδ ασσ am I." PASS: drain-input: encoded input
bug#22901: drain-input doesn't decode
I put together a test and tried on 2.1.7 - my test fails. See attached. (pass-if "encoded input" (let ((fn (test-file)) (nc "utf-8") (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.") ;;(st "hello, world\n") ) (let ((p1 (open-output-file fn #:encoding nc))) ;;(display st p1) (string-for-each (lambda (ch) (write-char ch p1)) st) (close p1)) (let* ((p0 (open-input-file fn #:encoding nc)) (s0 (begin (unread-char (read-char p0) p0) (drain-input p0 (simple-format #t "~S\n" s0) (equal? s0 st port-di.test Description: Binary data
bug#22901: drain-input doesn't decode
On Fri 04 Mar 2016 04:09, Zefram writes: > The documentation for drain-input says that it returns a string of > characters, implying that the result is equivalent to what you'd get > from calling read-char some number of times. In fact it differs in a > significant respect: whereas read-char decodes input octets according to > the port's selected encoding, drain-input ignores the selected encoding > and always decodes according to ISO-8859-1 (thus preserving the octet > values in character form). > > $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! > (current-input-port) "UCS-2BE") (write (port-encoding > (current-input-port))) (newline) (write (map char->integer (let r ((l > '\''())) (let ((c (read-char (current-input-port (if (eof-object? > c) (reverse l) (r (cons c l))) (newline)' > "UCS-2BE" > (353 610 867) > $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! > (current-input-port) "UCS-2BE") (write (port-encoding > (current-input-port))) (newline) (peek-char (current-input-port)) > (write (map char->integer (string->list (drain-input > (current-input-port) (newline)' > "UCS-2BE" > (1 97 2 98 3 99) Thanks for the test case! FWIW, this is fixed in Guile 2.1.3. I am not sure what we should do about Guile 2.0. I guess we should make it do the documented thing though! Andy
bug#22901: drain-input doesn't decode
The documentation for drain-input says that it returns a string of characters, implying that the result is equivalent to what you'd get from calling read-char some number of times. In fact it differs in a significant respect: whereas read-char decodes input octets according to the port's selected encoding, drain-input ignores the selected encoding and always decodes according to ISO-8859-1 (thus preserving the octet values in character form). $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port (if (eof-object? c) (reverse l) (r (cons c l))) (newline)' "UCS-2BE" (353 610 867) $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char (current-input-port)) (write (map char->integer (string->list (drain-input (current-input-port) (newline)' "UCS-2BE" (1 97 2 98 3 99) The practical upshot is that the input returned by drain-input can't be used in the same way as regular input from read-char. It can still be used if the code doing the reading is totally aware of the encoding, so that it can perform the decoding manually, but this seems a failure of abstraction. The value returned by drain-input ought to be coherent with the abstraction level at which it is specified. I can see that there is a reason for drain-input to avoid performing decoding: the problem that occurs if the buffer ends in the middle of a character. If drain-input is to return decoded characters then presumably in this case it would have to read further octets beyond the buffer contents, in an unbuffered manner, until it reaches a character boundary. If this is too unpalatable, perhaps drain-input should be permitted only on ports configured for single-octet character encodings. If, on the other hand, it is decided to endorse the current non-decoding behaviour, then the break of abstraction needs to be documented. -zefram