Re: Unicode normalization SQL functions

Andreas Karlsson Wed, 12 Feb 2020 16:24:12 -0800

On 1/28/20 9:21 PM, Peter Eisentraut wrote:

You're right, this didn't make any sense. Here is a new patch set withthat fixed.

Thanks for this patch. This is a feature which has been on my personaltodo list for a while and something which I have wished to have a coupleof times.


I took a quick look at the patch and here is some feedback:

A possible concern is increased binary size from the new tables for thequickcheck but personally I think they are worth it.

A potential optimization would be to merge utf8_to_unicode() andpg_utf_mblen() into one function in unicode_normalize_func() sinceutf8_to_unicode() already knows length of the character. Probably notworth it though.

It feels a bit wasteful to measure output_size inunicode_is_normalized() since unicode_normalize() actually already knowsthe length of the buffer, it just does not return it.

A potential optimization for the normalized case would be to abort thequick check on the first maybe and normalize from that point on only. IfI can find the time I might try this out and benchmark it.

Nitpick: "split/\s*;\s*/, $line" in generate-unicode_normprops_table.plshould be "split /\s*;\s*/, $line".


What about using else if in the code below for clarity?

+               if (check == UNICODE_NORM_QC_NO)
+                       return UNICODE_NORM_QC_NO;
+               if (check == UNICODE_NORM_QC_MAYBE)
+                       result = UNICODE_NORM_QC_MAYBE;

Remove extra space in the line below.

+       else if (quickcheck == UNICODE_NORM_QC_NO )

Andreas

Re: Unicode normalization SQL functions

Reply via email to