The specifications of utf16->string and utf32->string
are worse than misguided: they are actually incorrect.
The specification of utf16->string says:
Bytevector is decoded according to UTF-16BE or UTF-16LE:
By the definitions of UTF-16BE and UTF-16LE, that means
that any byte-order mark (BOM) at the beginning of the
bytevector will be decoded as an ordinary character, not
as a BOM. Yet the specification goes on to say:
If endianness-mandatory? is absent or #f, utf16->string
determines the endianness according to a UTF-16 BOM at
the beginning of bytevector if a BOM is present; in this
case, the BOM is not decoded as a character.
That flatly contradicts the previous assertion that the
bytevector is decoded according to UTF-16BE or UTF-16LE.
The specification of utf32->string has exactly the same
problem.
I believe the intended specification was along these lines:
Bytevector is decoded acccording to UTF-16, UTF-16BE,
UTF-16LE, or a fourth encoding scheme that differs from
all three of those, depending upon the optional arguments
endianness and endianness-mandatory. If endianness
is the symbol big and endianness-mandatory is absent
or false, then bytevector is decoded according to
UTF-16. If endianness is the symbol big and
endianness-mandatory is #t, then bytevector is decoded
according to UTF-16BE. If endianness is the symbol
little and endianness-mandatory is #t, then bytevector
is decoded according to UTF-16LE. If endianness is
the symbol little and endianness-mandatory is absent
or #f, then the bytevector is decoded according to
UTF-16 if it begins with a BOM but is decoded according
to UTF-16LE if it does not begin with a BOM; note that
this fourth decoding does not correspond to any of the
seven Unicode encoding schemes that are defined by the
Unicode standard.
Will
_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss