Re: regex match and special characters

Oleksii Kliukin Sat, 18 Aug 2018 07:26:13 -0700

> On 16. Aug 2018, at 16:57, Tom Lane <[email protected]> wrote:
> 
> Alex Kliukin <[email protected]> writes:
>> Here is a simple SQL statement that gives different results on PostgreSQL 
>> 9.6 and PostgreSQL 10+. The space character at the end of the string is 
>> actually U+2006 SIX-PER-EM SPACE 
>> (http://www.fileformat.info/info/unicode/char/2006/index.htm)
> 
> I think the reason for the discrepancy is that in v10 we fixed the regex
> locale support so that it could properly classify code points above U+7FF,
> cf
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c54159d44ceaba26ceda9fea1804f0de122a8f30
>  
> <https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c54159d44ceaba26ceda9fea1804f0de122a8f30>


This nails down the cause, thanks a lot for the link! Apparently I missed it 
from PostgreSQL 10 release notes, where it is present in the “Queries” section, 
although AFAIK it deserved an entry in the "migration to version 10”, as it may 
potentially make dump/restore from previous versions to version 10 error out if 
there are table constraints that use regex classes over the Unicode text fields 
with code points above U+7FF.

> 
> So 10 is giving the right answer (i.e. that \s matches U+2006).
> 9.x is not

Agreed.

Cheers,
Alex

Re: regex match and special characters

Reply via email to