Re: [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints

Junio C Hamano Mon, 08 Jul 2013 12:36:43 -0700

Peter Krefting <pe...@softwolves.pp.se> writes:

> brian m. carlson:
>
>> +            /* U+FFFE and U+FFFF are guaranteed non-characters. */
>> +            if ((codepoint & 0x1ffffe) == 0xfffe)
>> +                    return bad_offset;
>
> I missed this the first time around: All Unicode characters whose
> lower 16-bits are FFFE or FFFF are non-characters, so you can re-write
> that to:
>
>   /* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
>   if ((codepoint & 0xfffe) == 0xfffe)
>    return bad_offset;
>
> Also, the range U+FDD0--U+FDEF are also non-characters, if you wish to
> be really pedantic.


Yeah, while we are at it, doing this may not hurt.  I think Brian's
two patches are in fairly good shape otherwise, so perhaps you can
do this as a follow-up patch on top of the tip of the topic,
e82bd6cc (commit: reject overlong UTF-8 sequences, 2013-07-04)?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints

Reply via email to