On 1/28/20 9:21 PM, Peter Eisentraut wrote:
You're right, this didn't make any sense. Here is a new patch set with
that fixed.
Thanks for this patch. This is a feature which has been on my personal
todo list for a while and something which I have wished to have a couple
of times.
I took a quick look at the patch and here is some feedback:
A possible concern is increased binary size from the new tables for the
quickcheck but personally I think they are worth it.
A potential optimization would be to merge utf8_to_unicode() and
pg_utf_mblen() into one function in unicode_normalize_func() since
utf8_to_unicode() already knows length of the character. Probably not
worth it though.
It feels a bit wasteful to measure output_size in
unicode_is_normalized() since unicode_normalize() actually already knows
the length of the buffer, it just does not return it.
A potential optimization for the normalized case would be to abort the
quick check on the first maybe and normalize from that point on only. If
I can find the time I might try this out and benchmark it.
Nitpick: "split/\s*;\s*/, $line" in generate-unicode_normprops_table.pl
should be "split /\s*;\s*/, $line".
What about using else if in the code below for clarity?
+ if (check == UNICODE_NORM_QC_NO)
+ return UNICODE_NORM_QC_NO;
+ if (check == UNICODE_NORM_QC_MAYBE)
+ result = UNICODE_NORM_QC_MAYBE;
Remove extra space in the line below.
+ else if (quickcheck == UNICODE_NORM_QC_NO )
Andreas