Re: Inexplicable duplicate rows with unique constraint

Susan Hurst Thu, 16 Jan 2020 10:12:46 -0800

That's why I created a virtual_string function to squeeze out everythingbut alpha characters and numbers 0-9 from any varchar or text columnsthat I want to use as business key columns. For example, if I have acolumn named job_name, I will have a companion column named v_job_name.The v_ column is to replicate Oracle's virtual column, since postgresdoesn't have it. You don't put any values in the v_ column directly. Isimply have a trigger on insert or update to put the value in thev_job_name column using the virtual_string(new.job_name) function. It'sthe v_job_name column that use in my unique constraint so that I avoidany unexpected sorting. Meanwhile, my job_name column is still humanreadable with whatever characters I want to see, including diacritics.


Here is my function, if you want to try it out:


create or replace function store.virtual_string(string_in text)
returns text as
$body$
        declare
                l_return text;
        begin

l_return := regexp_replace(lower(unaccent(string_in)),'[^0-9a-z]','','g');

                return l_return;
        end;
$body$
  language plpgsql volatile security definer
;

Sue

---
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Susan E Hurst
Principal Consultant
Brookhurst Data LLC
Email: [email protected]
Mobile: 314-486-3261

On 2020-01-16 11:48, Tom Lane wrote:

Richard van der Hoff <[email protected]> writes:

On 16/01/2020 17:12, Magnus Hagander wrote:

See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
which linux distros updated when.

It seems like a plausible explanation but it's worth noting that allthe
indexed data here is (despite being in text columns), plain ascii. I'm
surprised that a change in collation rules would change the sorting of
such strings, and hence that it could lead to this problem. Am Inaive?


Unfortunately, strings containing punctuation do sort differently
after these changes, even with all-ASCII data.  The example given
on that wiki page demonstrates this.

RHEL6 (old glibc):

$ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
11
1-1

Fedora 30 (new glibc):

$ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
1-1
11

I concur with Daniel's suggestion that maybe "C" locale is
the thing to use for this data.

                        regards, tom lane

Re: Inexplicable duplicate rows with unique constraint

Reply via email to