=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santama...@gmail.com> writes: > On Thu, Jan 23, 2020 at 11:00 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> * It's not exactly apparent to me why this code should be concerned >> about non-normalized characters when noplace else in the backend is.
> There is an open patch that will make the normalization functionality user > visible [1]. So, if a user can call to_date(normalize('01 ŞUB 2010'), 'DD > TMMON YYYY') I would vote to drop the normalization logic inside this patch > altogether. Works for me. > * I have no faith in this calculation that decides how long the match >> length was: >> *len = element_len + name_len - norm_len; > The proper logic would come from do_to_timestamp() receiving a normalized > "date_txt" input, so we do not operate with unnormalize and normalize > strings at the same time. No, that only solves half the problem, because the downcasing transformation can change the string length too. Two easy examples: * In German, I believe "ß" downcases to "ss". In Latin-1 encoding that's a length change, though I think it might accidentally not be in UTF8. * The Turks distinguish dotted and dotless "i", so that "İ" downcases to "i", and conversely "I" downcases to "ı". Those are length changes in UTF8, though not in whichever Latin-N encoding works for Turkish. Even if these cases happen not to apply to any month or day name of the relevant language, we still have a problem, arising from the fact that we're downcasing the whole remaining string --- so length changes after the match could occur anyway. > I would like to rise a couple of questions myself: > * When compiled with DEBUG_TO_FROM_CHAR, there is a warning "‘dump_node’ > defined but not used". Should we drop this function or uncomment its usage? Maybe, but I don't think it belongs in this patch. > * Would it be worth moving str_tolower(localized_name) > from seq_search_localized() into cache_locale_time()? I think it'd complicate tracking when that cache has to be invalidated (i.e. it now depends on more than just LC_TIME). On the whole I wouldn't bother unless someone does the measurements to show there'd be a useful speedup. regards, tom lane