Re: [PATCH v9 6/8] convert: check for detectable errors in UTF encodings

Junio C Hamano Mon, 05 Mar 2018 17:24:07 -0800

Lars Schneider <[email protected]> writes:

>> On 05 Mar 2018, at 22:50, Junio C Hamano <[email protected]> wrote:
>> 
>> [email protected] writes:
>> 
>>> +static int validate_encoding(const char *path, const char *enc,
>>> +                 const char *data, size_t len, int die_on_error)
>>> +{
>>> +   if (!memcmp("UTF-", enc, 4)) {
>> 
>> Does the caller already know that enc is sufficiently long that
>> using memcmp is safe?
>
> No :-(
>
> Would you be willing to squash that in?
>
>     if (strlen(enc) > 4 && !memcmp("UTF-", enc, 4)) {
>
> I deliberately used "> 4" as plain "UTF-" is not even valid.


I'd rather not.  The code does not have to even look at 6th and
later bytes in the enc[] even if it wanted to reject "UTF-" followed
by nothing, but use of strlen() forces it to look at everything.

Stepping back, shouldn't

        if (starts_with(enc, "UTF-") 

be sufficient?  If you really care about the case where "UTF-" alone
comes here, you could write

        if (starts_with(enc, "UTF-") && enc[4])

but I do not think "&& enc[4]" is even needed.  The functions called
from this block would not consider "UTF-" alone as something valid
anyway, so with that "&& enf[4]" we would be piling more code only
for invalid/rare case.

Re: [PATCH v9 6/8] convert: check for detectable errors in UTF encodings

Reply via email to