Lars Schneider <[email protected]> writes:
>> On 05 Mar 2018, at 22:50, Junio C Hamano <[email protected]> wrote:
>>
>> [email protected] writes:
>>
>>> +static int validate_encoding(const char *path, const char *enc,
>>> + const char *data, size_t len, int die_on_error)
>>> +{
>>> + if (!memcmp("UTF-", enc, 4)) {
>>
>> Does the caller already know that enc is sufficiently long that
>> using memcmp is safe?
>
> No :-(
>
> Would you be willing to squash that in?
>
> if (strlen(enc) > 4 && !memcmp("UTF-", enc, 4)) {
>
> I deliberately used "> 4" as plain "UTF-" is not even valid.
I'd rather not. The code does not have to even look at 6th and
later bytes in the enc[] even if it wanted to reject "UTF-" followed
by nothing, but use of strlen() forces it to look at everything.
Stepping back, shouldn't
if (starts_with(enc, "UTF-")
be sufficient? If you really care about the case where "UTF-" alone
comes here, you could write
if (starts_with(enc, "UTF-") && enc[4])
but I do not think "&& enc[4]" is even needed. The functions called
from this block would not consider "UTF-" alone as something valid
anyway, so with that "&& enf[4]" we would be piling more code only
for invalid/rare case.