On 2017/07/05 15:28, Michael Paquier wrote:
I have finally been able to look at this patch.
Thanks for reviewing and the new version of the patch.
(Surprised to see that generate_unaccent_rules.py is inconsistent on MacOS, runs fine on Linux). def get_plain_letter(codepoint, table): """Return the base codepoint without marks.""" if is_letter_with_marks(codepoint, table): - return table[codepoint.combining_ids[0]] + if len(table[codepoint.combining_ids[0]].combining_ids) > 1: + # Recursive to find the plain letter + return get_plain_letter(table[codepoint.combining_ids[0]],table) + elif is_plain_letter(table[codepoint.combining_ids[0]]): + return table[codepoint.combining_ids[0]] + else: + return None elif is_plain_letter(codepoint): return codepoint else: - raise "mu" + return None The code paths returning None should not be reached, so I would suggest adding an assertion instead. Callers of get_plain_letter would blow up on None, still that would make future debugging harder. def is_letter_with_marks(codepoint, table): - """Returns true for plain letters combined with one or more marks.""" + """Returns true for letters combined with one or more marks.""" # See http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values return len(codepoint.combining_ids) > 1 and \ - is_plain_letter(table[codepoint.combining_ids[0]]) and \ + (is_plain_letter(table[codepoint.combining_ids[0]]) or\ + is_letter_with_marks(table[codepoint.combining_ids[0]],table)) and \ all(is_mark(table[i]) for i in codepoint.combining_ids[1:] This was already hard to follow, and this patch makes its harder. I think that the thing should be refactored with multiple conditions. if is_letter_with_marks(codepoint, table): - charactersSet.add((codepoint.id, + if get_plain_letter(codepoint, table) <> None: + charactersSet.add((codepoint.id, This change is not necessary as a letter with marks is not a plain character anyway. Testing with characters having two accents, the results are produced as wanted. I am attaching an updated patch with all those simplifications. Thoughts?
Thanks, so pretty. The patch is fine to me. --- Thanks and best regards, Dang Minh Huong --- このEメールはアバスト アンチウイルスによりウイルススキャンされています。 https://www.avast.com/antivirus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers