32 BOM

Lars Schneider Tue, 06 Mar 2018 14:39:35 -0800

> On 06 Mar 2018, at 21:50, Junio C Hamano <[email protected]> wrote:
> 
> [email protected] writes:
> 
>> +int is_missing_required_utf_bom(const char *enc, const char *data, size_t 
>> len)
>> +{
>> +    return (
>> +       !strcmp(enc, "UTF-16") &&
>> +       !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
>> +         has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
>> +    ) || (
>> +       !strcmp(enc, "UTF-32") &&
>> +       !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
>> +         has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
>> +    );
>> +}
> 
> These strcmp() calls seem inconsistent with the principle embodied
> by utf8.c::fallback_encoding(), i.e. "be lenient to what we accept",
> and make the interface uneven. I am wondering if we also want to
> complain when the user gave us "utf16" and there is no byte order
> mark in the contents, for example?


Well, if I use stricmp() then I don't need to call and cleanup
xstrdup_toupper() as discussed with Eric [1]. Is there a case
insensitive starts_with() method?

[1] 
https://public-inbox.org/git/CAPig+cQE0pKs-AMvh4GndyCXBMnx=70jppdm6k4jjte-74f...@mail.gmail.com/


>  Also "UTF16" or other spelling
> the platform may support but this code fails to recognise will go
> unchecked.

That is true. However, I would assume all iconv implementations use the
same encoding names for UTF encodings, no? That means UTF16 would never be
valid. Would you agree?

- Lars

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

Reply via email to