On 29 Sep 2009, at 5:39 pm, John Cowan wrote: > Alaric Snell-Pym scripsit: > >>> Character encoding/decoding needs to be done in big buffers for the >>> same reason that actual I/O does. >> >> Why? Just because of the procedure call overhead of read-char? > > Well, yes -- in a compiler. But in an interpreter, you are crossing > an > expensive abstraction barrier between interpreted code and compiled > code > every time you convert a character, which (in file-oriented programs) > is probably your inner loop. Much better to have that inner loop in > C, > possibly inside libiconv, which knows more about the subject than the > average Scheme implementer.
Ok; but then the issue is more one of providing procedures to read whole strings, rather than char-at-a-time; reading ahead a string to feed to the user a char at a time is, indeed, a problem if you don't know when you're going to want to stop reading strings and go back to reading bytes. So don't do it ;-) Give the user (read-string-of-fixed- length <bytes> [<port>] [<encoding>]) or (read-string-up-to-delimiter <codepoint>|(<codepoint>...) [<port>] [<encoding>]), or some such. > Similarly, why not just read one byte from the filesystem whenever we > want binary data? The kernel does plenty of hidden buffering already; > why bother with stdio and user buffers? Because it is expensive to > cross the abstraction barrier between userland and the kernel, and > your > performance goes to hell. Yep, but such readahead can be hidden from the user. You call getchar() on a FILE *, and without you needing to pay any attention, you only do a read() syscall every few KB, and everyone's a winner. ABS -- Alaric Snell-Pym Work: http://www.snell-systems.co.uk/ Play: http://www.snell-pym.org.uk/alaric/ Blog: http://www.snell-pym.org.uk/archives/author/alaric/ _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
