Re: [HACKERS] Extra Vietnamese unaccent rules

Michael Paquier Sat, 03 Jun 2017 16:13:43 -0700

On Mon, May 29, 2017 at 10:47 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
>> [Quoting Michael]
>>> Actually, with the recent work that has been done with
>>> unicode_norm_table.h which has been to transpose UnicodeData.txt into
>>> user-friendly tables, shouldn't the python script of unaccent/ be
>>> replaced by something that works on this table? This does a canonical
>>> decomposition but just keeps the first characters with a class
>>> ordering of 0. So we have basic APIs able to look at UnicodeData.txt
>>> and let caller do decision making with the result returned.
>>
>> Thanks, i will learning about it.
>
> It seems like that could be useful for runtime use (I'm sure there is
> a whole world of Unicode support we could add), but here we're only
> trying to generate a mapping file to add to the source tree, so I'm
> not sure how it's relevant.


Yes, that's what I am coming at, but that would be really dictionnary
specific and that would be roughly to provide a fast-path equivalent
to the tsearch_readline* routines working on files. The addition of
new infrastructure may perhaps not be worth the performance gains.
Definitely for this fix there is no need to do anything more
complicated than tweaking the script to generate the rules.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extra Vietnamese unaccent rules

Reply via email to