Abdulaziz Ghuloum scripsit: > I can't find this in the spec.
This is a Unicode question rather than an R6RS question. > For textual ports obtained using open- file-input-port, > open-bytevector-input-port, and transcoded-port, is the first call > to get-char/peek-char supposed to recognize a BOM if it exists in > the beginning of the port buffer, or should a BOM, if one exists, > be decoded as a regular character? If the encoding is UTF-16, the process MUST recognize a BOM if one is present and use it to set the endianness of what follows: the BOM MUST not be returned to the caller. If no BOM is present, the process SHOULD use a local convention if there is one (this mostly means that Windows UTF-16 files are typically little-endian), and if not, SHOULD assume big-endian. The same is true of the UTF-32 encoding if you choose to support it. In the encodings UTF-16BE, UTF-16LE, UTF-32BE, and UTF-32LE, there are no BOMs; the endianness is specified by the encoding name, and any U+FEFF characters MUST be returned as such. In the UTF-8 encoding, things are less clear-cut. A process SHOULD discard any BOM that is present. There are no endianness considerations for UTF-8, so the BOM is serving primarily as a signature. (See RFC 2119 for the meanings of MUST, SHOULD, and MAY.) > A related question is about the endianness of the data read when > using the (utf-16-codec) in a transcoder that's passed to any of the > procedures listed above. Should the BOM, if one exists, be used to > determine the endianness of the data in the port? Yes. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan Historians aren't constantly confronted with people who carry on self-confidently about the rule against adultery in the sixth amendment to the Declamation of Independence, as written by Benjamin Hamilton. Computer scientists aren't always having to correct people who make bold assertions about the value of Objectivist Programming, as examplified in the HCNL entities stored in Relaxational Databases. --Mark Liberman _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
