Hi DoGeon,

Thanks for picking this up, and nice first patch.  I reviewed v1 (both
patches) and the change is correct.  Below are one independent
confirmation, a sourcing note, and two optional test additions.

> the accept/reject outcome for any input is unchanged;

I checked this exhaustively rather than by spot check.  Scanning the
full two-byte space against PostgreSQL's own uhc_to_utf8.map (17,237
mapped sequences), the tightened accept set (lead 0x81-0xFE; trail
0x41-0x5A, 0x61-0x7A, 0x81-0xFE) is a strict superset of every mapped
sequence -- zero real mappings fall in the newly-rejected ranges.  So
nothing that decodes today stops decoding; only the eight structurally
invalid pairs move to the correct error.

> - Microsoft CP949 (Windows-949) specifies the two-byte form as
>   lead 0x81-0xFE, trail 0x41-0x5A | 0x61-0x7A | 0x81-0xFE.

Right -- and even the WHATWG Encoding Standard's euc-kr (= CP949) decoder
takes a wider trail, 0x41-0x7E and 0x81-0xFE.  Side by side:

    Rule                Lead        Trail
    ------------------  ---------   -------------------------------
    Old verifier        0x80-0xFF   any byte but 0x00
    WHATWG (CP949)      0x81-0xFE   0x41-0x7E, 0x81-0xFE
    CP949 / this patch  0x81-0xFE   0x41-0x5A, 0x61-0x7A, 0x81-0xFE

Your rule matches the actual CP949 assignment and is even tighter than
the WHATWG structural envelope, rejecting the gaps at verify time.

Two optional test cases would close the last coverage gaps in uhc.sql
(neither blocks commit):

    -- accept: upper lead boundary 0xFE.  Today 0xFE appears only as a
    -- trail byte, so the `c1 > 0xfe` bound is never exercised.
    SELECT encode(convert_to(convert_from('\xfea1', 'UHC'), 'UTF8'), 'hex');
    -- -> ee819e

    -- reject: trail 0x00, the sole trail the old verifier also rejected.
    SELECT convert_from('\x8100', 'UHC'); -- 0x00
    -- -> ERROR:  invalid byte sequence for encoding "UHC": 0x81 0x00

Both pass with the patch applied.  With those folded in, this looks ready
to me.

Thanks again,
Henson

>

Reply via email to