[Scheme-reports] I/O redux

Vincent Manis Mon, 02 May 2011 15:01:48 -0700

This email contains follow-up on some old points on I/O, and a few new ones.


1. [Old] Re my proposal for STANDARD-{INPUT,OUTPUT,ERROR}-PORT. John Cowan (I 
think) felt that these were useless. I'm not a big one for rebinding/mutating 
current I/O streams; I prefer normally to use ports directly, or write small 
blocks of code that use WITH-{INPUT-FROM,OUTPUT-TO}-FILE. However, in a messy 
enough program that's constantly switching current ports all over the place, 
it's convenient to be able to access the standard ports directly. Obviously, 
three lines of code at the beginning of the program will capture them, but I'd 
still like to see them brought into the standard. I don't feel strongly on this 
matter, but thought I'd give it a second kick at the can. 

Also, are these ports always defined? Is it possible that CURRENT-INPUT-PORT 
might not be set at all, or might have value #f, in some cases? If I'm not 
mistaken, a Windows executable has no standard input or output. 

I would also put in a weak suggestion for CONSOLE-INPUT-PORT and 
CONSOLE-OUTPUT-PORT, for situations where the I/O has to be from/to the REAL 
terminal, with a proviso that these might not be available under some 
implementations (i.e., their value is #f). I don't feel strongly about this, 
but thought I'd toss it in for consideration. 

2. [OLD] The fact that IEEE Scheme is required to be a subset of WG1 is 
sufficient reason to include CHAR-READY? and U8-READY?. However, given the 
difficulty of implementing them correctly in many environments, it's also 
reasonable to discourage programs from using them. A careful reading of the 
CHAR-READY? entry shows that it's possible that CHAR-READY? returns #f when 
there actually is a character available [*], which exactly matches the case 
where you can only find out whether there's any data by attempting to read. 
This is either accidental or a brilliant example of VERY careful language 
lawyering!  I would suggest clarifying this point by adding some remark about 
some environments making it extremely difficult to implement CHAR-READY 
reliably, so it might return #f when a character is available, and adding a 
similar remark to the U8-READY? entry.

[*] Technically, CHAR-READY? is to return #f `otherwise', when no character is 
available. However, nobody can distinguish the case where CHAR-READY? outright 
lies, claiming there's nothing there when there is, from the case where there 
really WAS no character available, and then 1 zeptosecond later one appeared. 
It is not stated that, at the moment a character would be read successfully, 
CHAR-READY?, if called, would have to return #t. 

[I assume that few if any implementations would use non-blocking I/O just so 
they can support CHAR-READY? correctly.]

3. [Old] I had suggested adding a remark that some implementations support 
other kinds of sources and sinks beside files (and devices). John remarked that 
this is addressed in the first paragraph of §6.7. That says that other kinds of 
ports besides binary and character might be provided, which is a different 
point. My remark was aimed at conveying that an implementation might provide 
other kinds of binary/character ports that the procedures in §§6.7.2 and 6.7.3 
will handle. 

4. [Old] I had expressed confusion about the notion that binary ports 
inherently support character operations. This morning I had an epiphany on this 
subject. To me, a `binary port' is a port that is used to read or write 
successive octets, while a `character port' contains additional encoding 
support, even if it's just end-of-line translation. Thus in C-derived I/O 
systems one might do a fopen(filnam, "r") for character reading, and 
fopen(filnam, "rb") for binary reading.

This is NOT how these terms are used in the Report! A binary port is one whose 
backing store (on disk or elsewhere) contains octets, while a character port 
(e.g., a string port) has a backing store containing Scheme characters. The 
term `binary' doesn't refer to reading or writing in binary mode, but to the 
type of backing store the port uses. This is implied, but not stated, by the 
current wording, leaving people like me relatively free to misunderstand the 
point. 

Short of changing the terminology, which may not be practical, perhaps a 
sentence or two defining these terms more precisely could be added. 

5. [New] §6.7.1, bottom of col 2, p. 45. WITH-INPUT-FROM-FILE and 
WITH-OUTPUT-TO-FILE are defined, but should not WITH-ERROR-TO-FILE also be 
added? 

6. [New] Most implementations provide a procedure named something like 
READ-LINE that reads the next line from an input port. Processing a file by 
lines is an extremely common paradigm, and should therefore be supported. (I 
can rant at great length about why this should be here, but I'll spare you my 
ranting on this point unless you think it's needed :). 

7. [New] What happens if both READ-CHAR and READ-U8 are used on the same port? 
I can envision several possible answers. 

  A. legal 
  B. `it is an error'
  C. `an error is signalled' 
  D. implementation-defined, might be an error in some or all cases

The example I've been thinking of is a UTF-8 encoded file in which one reads 
the first octet of a character via READ-U8 and then attempts to do a READ-CHAR. 

If those were the options, I'd vote for D, which allows the implementation to 
provide additional ways of resynchronizing (e.g., by rewinding the file) that 
are outside the scope of WG1. B is also fine; C is implementable; one just 
needs a tri-state variable (neutral/char/u8) in each port, but I'd question the 
point of doing this. I'm not sure that A makes any sense. 

I don't much care which option (or some other one) is selected, but it's 
important to say what happens.

Writing doesn't suffer from this problem, I'm not sure if symmetry is important 
or not. 

8. [New] §6.7.4: LOAD/INCLUDE. Some implementations use LOAD's argument to name 
a file, others do some kind of path search, or do some other transformation on 
the name. Gambit, for example, uses a prefix of ~~/ to signify looking in the 
Gambit directory. I suggest replacing `_filename_ should be a string naming an 
existing file containing Scheme source code' with `An implementation-dependent 
operation is used to transform _filename_ into the name of an existing file 
containing Scheme source code'. Whether the parameter name should still be 
_filename_ is not for me to say. 

-- vincent
_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

[Scheme-reports] I/O redux

Reply via email to