Re: [r6rs-discuss] BOM at start of ports

William D Clinger Sat, 08 Dec 2007 12:30:52 -0800

The specifications of utf16->string and utf32->string
are worse than misguided: they are actually incorrect.


The specification of utf16->string says:

    Bytevector is decoded according to UTF-16BE or UTF-16LE:

By the definitions of UTF-16BE and UTF-16LE, that means
that any byte-order mark (BOM) at the beginning of the
bytevector will be decoded as an ordinary character, not
as a BOM.  Yet the specification goes on to say:

    If endianness-mandatory? is absent or #f, utf16->string
    determines the endianness according to a UTF-16 BOM at
    the beginning of bytevector if a BOM is present; in this
    case, the BOM is not decoded as a character.

That flatly contradicts the previous assertion that the
bytevector is decoded according to UTF-16BE or UTF-16LE.

The specification of utf32->string has exactly the same
problem.

I believe the intended specification was along these lines:

    Bytevector is decoded acccording to UTF-16, UTF-16BE,
    UTF-16LE, or a fourth encoding scheme that differs from
    all three of those, depending upon the optional arguments
    endianness and endianness-mandatory.  If endianness
    is the symbol big and endianness-mandatory is absent
    or false, then bytevector is decoded according to
    UTF-16.  If endianness is the symbol big and
    endianness-mandatory is #t, then bytevector is decoded
    according to UTF-16BE.  If endianness is the symbol
    little and endianness-mandatory is #t, then bytevector
    is decoded according to UTF-16LE.  If endianness is
    the symbol little and endianness-mandatory is absent
    or #f, then the bytevector is decoded according to
    UTF-16 if it begins with a BOM but is decoded according
    to UTF-16LE if it does not begin with a BOM; note that
    this fourth decoding does not correspond to any of the
    seven Unicode encoding schemes that are defined by the
    Unicode standard.

Will

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] BOM at start of ports

Reply via email to