Package: perl Version: 5.20.2-2 The Encode::Unicode documentation states the following:
When BE or LE is omitted during decode(), it checks if BOM is at the beginning of the string; if one is found, the endianness is set to what the BOM says. If no BOM is found, the routine dies. To reproduce: --- use Encode qw/decode/; decode("utf-16be", "Hello World"); # does not die decode("utf-16le", "Hello World"); # does not die decode("utf-16", "\xFE\xFFHello World"); # does not die decode("utf-16", "Hello World"); # dies with "UTF-16:Unrecognised BOM" --- Unicode Standard version 8.0: The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian. RFC2781: If the first two octets of the text is not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian. There is a simple fix of doing nothing: diff --git a/cpan/Encode/Unicode/Unicode.xs b/cpan/Encode/Unicode/Unicode.xs index cf42ab8..7caf1c1 100644 --- a/cpan/Encode/Unicode/Unicode.xs +++ b/cpan/Encode/Unicode/Unicode.xs @@ -164,9 +164,18 @@ CODE: endian = 'V'; } else { - croak("%"SVf":Unrecognised BOM %"UVxf, - *hv_fetch((HV *)SvRV(obj),"Name",4,0), - bom); + /* No BOM found, use big-endian fallback as specified in + * RFC2781 and the Unicode Standard version 8.0: + * + * The UTF-16 encoding scheme may or may not begin with + * a BOM. However, when there is no BOM, and in the + * absence of a higher-level protocol, the byte order + * of the UTF-16 encoding scheme is big-endian. + * + * If the first two octets of the text is not 0xFE + * followed by 0xFF, and is not 0xFF followed by 0xFE, + * then the text SHOULD be interpreted as big-endian. + */ } } #if 1 CPAN bug report: https://rt.cpan.org/Ticket/Display.html?id=107043