Re: Unicode normalization SQL functions

Peter Eisentraut Thu, 02 Apr 2020 00:51:39 -0700

On 2020-03-26 18:41, John Naylor wrote:

We don't have a trie implementation in Postgres, but we do have a
perfect hash implementation. Doing that would bring the tables back to
64 bits per entry, but would likely be noticeably faster than binary
search. Since v4 has left out the biggest tables entirely, I think
this might be worth a look for the smaller tables remaining.


In the attached v5, when building the hash tables, we sort the code
points by NO/MAYBE, and store the index of the beginning of the NO
block:

This is a valuable idea, but I fear it's a bit late now in this cycle.I have questions about some details. For example, you mention that youhad to fiddle with the hash seed. How does that affect other users ofPerfectHash? What happens when we update Unicode data and the hashdoesn't work anymore? These discussions might derail this patch at thishour, so I have committed the previous patch. We can consider yourpatch as a follow-up patch, either now or in the future.


> Also, if we go with v4, I noticed the following test is present twice:
>
> +SELECT "normalize"('abc', 'def');  -- run-time error

I think this is correct.  The other test is for "is_normalized".

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unicode normalization SQL functions

Reply via email to