On Fri, Apr 7, 2017 at 2:47 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 04/06/2017 08:42 PM, Heikki Linnakangas wrote: >>> There is for example this portion in the new tables: >>> +static const Codepoint prohibited_output_chars[] = >>> +{ >>> + 0xD800, 0xF8FF, /* C.3, C.5 */ >>> >>> ----- Start Table C.5 ----- >>> D800-DFFF; [SURROGATE CODES] >>> ----- End Table C.5 ----- >>> This indicates a range of values. Wouldn't it be better to split this >>> table in two, one for the range of codepoints and another one with the >>> single entries? >> >> I considered that, but there are relatively few singular codepoints in >> the tables, so it wouldn't save much space. In this patch, singular >> codepoints are represented by a range like "0x3000, 0x3000".
I am really wondering if this should not reflect the real range reported by the RFC. I understand that you have grouped things to save a couple of bytes, but that would protect from any updates of the codepoints within those ranges (unlikely to happen I agree). >>> + 0x1D173, 0x1D17A, /* C.2.2 */ >>> This is for musical symbols. It seems to me that checking for a range >>> is what is intended. >> >> Can you elaborate? > > Oh, I think I understand the confusion now. All the arrays represent > codepoint ranges, not singular codepoints. I renamed them to "*_ranges", to > make that more clear. Thanks! Yes I got confused by the name. +/* Is the given Unicode codepoint in the given table? */ +#define IS_CODE_IN_TABLE(code, map) is_code_in_table(code, map, lengthof(map)) [...] +static bool +is_code_in_table(pg_wchar code, const pg_wchar *map, int mapsize) +{ + Assert(mapsize % 2 == 0); + + if (code < map[0] || code > map[mapsize - 1]) + return false; + + if (bsearch(&code, map, mapsize / 2, sizeof(pg_wchar) * 2, + codepoint_range_cmp)) + return true; + else + return false; +} Those could be renamed to range_table as well to avoid more confusion. > The SASLprep implementation depends on the UTF-8 functions from > src/backend/utils/mb/wchar.c. So to use it, you must also compile and link > that. That doesn't change anything for the current users of these > functions, the backend and libpq, as they both already link with wchar.o. > It would be good to move those functions into a separate file in > src/commmon, but I'll leave that for another day. Fine for me. You may want to add a .gitignore in src/common/unicode for norm_test and norm_test_table.h. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers