Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

Kyotaro HORIGUCHI Thu, 20 Oct 2011 18:38:23 -0700

Hello,

> > Robert Haas <robertmh...@gmail.com> writes:
> >> - Why does the second byte need special handling for 0xED and 0xF4?
> >
> > http://www.faqs.org/rfcs/rfc3629.html
> >
> > See section 4 in particular.  The underlying requirement is to disallow
> > multiple representations of the same Unicode code point.


 The special handling skips the utf8 code regions corresponds to
the regions U+D800 - U+DFFF and U+110000 - U+11ffff in ucs-4. The
former is reserved for use with the UTF-16 encoding form as
surrougate pairs and do not directly represent characters as
described in section 3 of rfc3629. The latter is the region which
is out of the utf-8 range by the definition described also in the
same section.

former> The definition of UTF-8 prohibits encoding character
former> numbers between U+D800 and U+DFFF, which are reserved for
former> use with the UTF-16 encoding form (as surrogate pairs)
former> and do not directly represent characters.

latter> In UTF-8, characters from the U+0000..U+10FFFF range (the
latter> UTF-16 accessible range) are encoded using sequences of 1
latter> to 4 octets.

# However, I wrote this exception simplly mimicked the
# pg_utf8_validator()'s behavior at the beginning.


This must be the basis of the behavior of pg_utf8_verifier(), and
pg_utf8_increment() has taken over it. It may be good to describe
this origin of the special handling as comment of these functions
to avoid this sort of confusion.


> I'm still confused.  The input string is already known to be valid
> UTF-8, so the second byte (if there is one) must be between 0x80 and
> 0xBF.  Therefore it will be neither 0xED nor 0xF4.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

Reply via email to