Re: Counting the number of repeated phrases in a column
you may be interested > https://dba.stackexchange.com/q/166762/238839 On Wed, Jan 26, 2022 at 3:03 AM Ivan Panchenko wrote: > > On 26.01.2022 00:21, benj@laposte.net wrote: > > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit : > >> There is a short of a function in the standard Postgres to do the > >> following: > >> > >> It is easy to count the number of occurrence of words, but it is > >> rather difficult to count the number of occurrence of phrases. > >> > >> For instance: > >> > >> A cell of value: 'Hello World' means 1 occurrence a phrase. > >> > >> A cell of value: 'Hello World World Hello' means no occurrence of any > >> repeated phrase. > >> > >> But, A cell of value: 'Hello World World Hello Hello World' means 2 > >> occurrences of 'Hello World'. > >> > >> 'The City of London, London' also has no occurrences of any repeated > >> phrase. > >> > >> Anyone has got such a function to check out the number of occurrence > >> of any repeated phrases? > >> > >> Regards, > >> > >> David > > > > Don't know if it's exactly what you want, but you can replace all > > occurence of the phrase in the text by empty string and compute the > > diff between the initial and the result and next divide by the length > > of your phrase. > > > > Example : > > WITH x AS (SELECT 'toto like tata and toto like titi and toto like > > tutu' , 'toto like' phrase) > > SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) > > / char_length(phrase) AS nb_occurence > > FROM x > > > This works if the user knows the phrase. As far as I understood, the > phrase is not known, and user wants to count number of repeats of any > phrases. > Of course this can be done with recursive CTE. Split into words, > generate all phrases (AFAIK requires recursion), then group and count. > > But probably in PL/Perl this could be done more effectively. > > > > >
Re: Undetected Deadlock
On 2022-Jan-25, Michael Harris wrote: > We've recently updated our application to PG 14.1, and in the test instance we > have started to see some alarming undetected deadlocks. This is indeed suspicious / worrisome / curious. What version were you using previously? I reformatted the result sets: > An example of what we have seen is: > > locktype | database | relation | page | tuple | virtualxid | transactionid > | classid | objid | objsubid | virtualtransaction | pid | mode > | granted | fastpath | waitstart |relation > --+--++--+---++---+-+---+--++-+-+-+--+---+ > relation | 529986 | 1842228045 | | || > | | | | 165/1941408| 2130531 | AccessShareLock > | f | f| 2022-01-19 00:32:32.626152+01 | st.ctr_table_efr_oa > (1 row) > > locktype | database | relation | page | tuple | virtualxid | transactionid > | classid | objid | objsubid | virtualtransaction | pid |mode > | granted | fastpath | waitstart | relation > --+--++--+---++---+-+---+--++-+-+-+--+---+ > relation | 529986 | 1842228045 | | || > | | | | 75/2193719 | 2128603 | > AccessExclusiveLock | t | f| | st.ctr_table_efr_oa > (1 row) > > locktype | database | relation | page | tuple | virtualxid | transactionid > | classid | objid | objsubid | virtualtransaction | pid |mode > | granted | fastpath | waitstart | relation > --+--++--+---++---+-+---+--++-+-+-+--+---+--- > relation | 529986 | 1842231489 | | || > | | | | 75/2193719 | 2128603 | > AccessExclusiveLock | f | f| 2022-01-19 00:32:32.924694+01 | > st.tpd_oa > (1 row) > >locktype| database | relation | page | tuple | virtualxid | > transactionid | classid | objid | objsubid | virtualtransaction | pid > | mode | granted | fastpath | waitstart > | relation > ---+--++--+---+--+---+-+---+--++-+---+-+--+---+--- > relation | 529986 | 1842231489 | | | | >| | | | 165/1941408| 2130531 | > AccessShareLock | t | f| | > st.tpd_oa > > So: > pid 2130531 waits for an AccessShareLock on relation 1842228045, blocked by > pid 2128603 which holds an AccessExclusiveLock > pid 2128603 waits for an AccessExclusiveLock on relation 1842231489, > blocked by pid 2130531 which holds an AccessShareLock > > The queries being executed by these backends are: > >pid | query_start | state_change | > wait_event_type | wait_event | state | query > -+---+---+-+++- > 2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19 00:32:32.924413+01 | > Lock| relation | active | DROP TABLE st.tpd_oa_18929 > 2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19 00:32:32.625708+01 | > Lock| relation | active | DELETE FROM st.ctr_table_efr_oa_19010 > WHERE ropid = 44788868 > (2 rows) I know of no cases in which we fail to detect a deadlock. Perhaps you have indeed hit a bug. > Note that there were a lot of other processes also waiting on relation > 1842231489 - could that be confusing the deadlock detection routine? It shouldn't. > I am also confused about the locks which are being taken out by the > DELETE query. Maybe the lock is already taken before the DELETE is run; do you have any triggers, rules, constraints, or anything? If you have seen this several times already, maybe a way to investigate deeper is an exhaustive log capture of everything that these transactions do, from the point they begin until they become blocked (log_statement=all). Perhaps you need to involve other concurrent transactions in order to cause the problem. -- Álvaro Herrera
Re: Counting the number of repeated phrases in a column
On 26.01.2022 00:21, benj@laposte.net wrote: Le 25/01/2022 à 18:10, Shaozhong SHI a écrit : There is a short of a function in the standard Postgres to do the following: It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence of phrases. For instance: A cell of value: 'Hello World' means 1 occurrence a phrase. A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase. But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'. 'The City of London, London' also has no occurrences of any repeated phrase. Anyone has got such a function to check out the number of occurrence of any repeated phrases? Regards, David Don't know if it's exactly what you want, but you can replace all occurence of the phrase in the text by empty string and compute the diff between the initial and the result and next divide by the length of your phrase. Example : WITH x AS (SELECT 'toto like tata and toto like titi and toto like tutu' , 'toto like' phrase) SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) / char_length(phrase) AS nb_occurence FROM x This works if the user knows the phrase. As far as I understood, the phrase is not known, and user wants to count number of repeats of any phrases. Of course this can be done with recursive CTE. Split into words, generate all phrases (AFAIK requires recursion), then group and count. But probably in PL/Perl this could be done more effectively.
Re: Counting the number of repeated phrases in a column
Le 25/01/2022 à 18:10, Shaozhong SHI a écrit : There is a short of a function in the standard Postgres to do the following: It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence of phrases. For instance: A cell of value: 'Hello World' means 1 occurrence a phrase. A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase. But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'. 'The City of London, London' also has no occurrences of any repeated phrase. Anyone has got such a function to check out the number of occurrence of any repeated phrases? Regards, David Don't know if it's exactly what you want, but you can replace all occurence of the phrase in the text by empty string and compute the diff between the initial and the result and next divide by the length of your phrase. Example : WITH x AS (SELECT 'toto like tata and toto like titi and toto like tutu' , 'toto like' phrase) SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) / char_length(phrase) AS nb_occurence FROM x OpenPGP_signature Description: OpenPGP digital signature
NIST 800-53v4 scanning?
Hello pgsql-general, I've been tasked with scanning our Ubuntu-hosted databases for NIST 800-53v4 compliance. I'm finding a paucity of tools out there that will do this. I found a few that might work, when pointed at Postgres 9 on RHEL, but not much else. Is this a problem anybody else has tackled? If so, how?
Re: Counting the number of repeated phrases in a column
‐‐‐ Original Message ‐‐‐ On Tuesday, January 25th, 2022 at 17:10, Shaozhong SHI wrote: > 'The City of London, London' also has no occurrences of any repeated phrase. Not sure the City would be particularly happy with that attribution. ;-) Its it sits on its own. Its own local authority, its own county. It is an enclave enclosed by Greater London. A bit like the Vatican really. Except the City isn't its own country - much to the chagrin of some, no doubt !
Re: Counting the number of repeated phrases in a column
On Tue, Jan 25, 2022 at 10:10 AM Shaozhong SHI wrote: > Anyone has got such a function to check out the number of occurrence of > any repeated phrases? > Not I. But I wouldn't be surprised that such an algorithm exists and that it has been implemented - in a language other than SQL or pl/pgsql. David J.
Aw: Re: Counting the number of repeated phrases in a column
> How about split up the value into individual words and keep their orders? > add words up to form individual phrase and ensure that each phrase only > consists unique/distinct words > count repeated phrases afterward > > How about this? Sure, if that serves your purpose ? So far, we (I?) can't tell because you have yet to (computably) define "phrase". Which may or may not solve the previous dilemma. (Top-posting is not liked on this list, to my knowledge.) Best, Karsten
Re: Counting the number of repeated phrases in a column
How about split up the value into individual words and keep their orders? add words up to form individual phrase and ensure that each phrase only consists unique/distinct words count repeated phrases afterward How about this? Regards, David On Tue, 25 Jan 2022 at 17:22, Karsten Hilbert wrote: > > There is a short of a function in the standard Postgres to do the > following: > > > > it is easy to count the number of occurrence of words, but it is rather > difficult to count the number of occurrence of phrases. > > > > For instance: > > > > A cell of value: 'Hello World' means 1 occurrence a phrase. > > > > A cell of value: 'Hello World World Hello' means no occurrence of any > repeated phrase. > > > > But, A cell of value: 'Hello World World Hello Hello World' means 2 > occurrences of 'Hello World'. > > > > 'The City of London, London' also has no occurrences of any repeated > phrase. > > > > Anyone has got such a function to check out the number of occurrence of > any repeated phrases? > > For that to become answerable you may want to define what to > do when facing ambiguity. > > Best, > Karsten > > >
Aw: Counting the number of repeated phrases in a column
> There is a short of a function in the standard Postgres to do the following: > > it is easy to count the number of occurrence of words, but it is rather > difficult to count the number of occurrence of phrases. > > For instance: > > A cell of value: 'Hello World' means 1 occurrence a phrase. > > A cell of value: 'Hello World World Hello' means no occurrence of any > repeated phrase. > > But, A cell of value: 'Hello World World Hello Hello World' means 2 > occurrences of 'Hello World'. > > 'The City of London, London' also has no occurrences of any repeated phrase. > > Anyone has got such a function to check out the number of occurrence of any > repeated phrases? For that to become answerable you may want to define what to do when facing ambiguity. Best, Karsten
Counting the number of repeated phrases in a column
There is a short of a function in the standard Postgres to do the following: It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence of phrases. For instance: A cell of value: 'Hello World' means 1 occurrence a phrase. A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase. But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'. 'The City of London, London' also has no occurrences of any repeated phrase. Anyone has got such a function to check out the number of occurrence of any repeated phrases? Regards, David
Re: Robust ways for checking allowed values in a column
On 1/25/22 09:35, Shaozhong SHI wrote: How about adding null as an alteration. Would this be robust? Regards, David On Tue, 25 Jan 2022 at 14:25, David G. Johnston wrote: On Tue, Jan 25, 2022 at 6:56 AM Shaozhong SHI wrote: select form from mytable where form ~ '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$' You do not need to repeat the boundary metacharacters on each branch. You can assert their presence just once and then use parentheses to group the alternations. form ~ '^(?Canal|Drain|etc...)$' David J. You would need to add form ~ 'expression' or form is null And a body of held water is a reservoir (no 'a')
Re: Robust ways for checking allowed values in a column
How about adding null as an alteration. Would this be robust? Regards, David On Tue, 25 Jan 2022 at 14:25, David G. Johnston wrote: > On Tue, Jan 25, 2022 at 6:56 AM Shaozhong SHI > wrote: > >> select form from mytable where form ~ >> '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$' >> > > You do not need to repeat the boundary metacharacters on each branch. You > can assert their presence just once and then use parentheses to group the > alternations. > > form ~ '^(?Canal|Drain|etc...)$' > > David J. > >
Re: tstzrange on large table gives poor estimate of expected rows
Thanks for your help. It is true we could get rid of it but we still want to use the functional index on the date range as we understand it is supposed to be a better look up - we also have other date range look ups on tables that seem to be degrading. I have found a solution to the problem. The postgres default_statistics_target is 100 and when we upped it to 1 the estimate was good. We could not have set the default to 1 on production but there appeared to be no way to change the value for the function index as statistics is set per column. However, in a post answered by Tom Lane in 2012 he gives a way to set the value for the statistics target on the functional index (https://www.postgresql.org/message-id/6668.1351105908%40sss.pgh.pa.us) Thanks. On Mon, 24 Jan 2022 at 17:43, Michael Lewis wrote: > > If interval_end_date is always 1 day ahead, why store it at all? > > Dependencies on a custom stats object wouldn't do anything I don't think > because they are offset. They are 100% correlated, but not in a way that any > of the existing stat types capture as far as I can figure.
Re: GIN index
Hi, On Tue, Jan 25, 2022 at 02:42:14AM +, huangning...@yahoo.com wrote: > Hi:I created a new variable-length data type, and now I want to create a GIN > index for it. According to the rules of GIN index, I created three functions: > extractValue, extractQuery, and compare. I made sure that the return value of > the first two functions is the address of the array, but when using the index > query, the GIN tuple data obtained by calling PG_GETARG_DATUM in the compare > function is incorrect, and it is misplaced! In memory the size of the data > header becomes something else, and the position of the first byte is not the > header, it becomes the fourth byte. So there is a high probability that the > function called is wrong or my return value is wrong when creating the index > or the error is somewhere else? It's hard to have an opinion without seeing any code extract. Have you checked contrib module for examples of other GIN opclasses, like pg_trgrm, or hstore for varlena datatype with GIN support? Note also that pgsql-hackers is probably a better mailing list for this kind of questions.
Re: Robust ways for checking allowed values in a column
On 25/01/2022 13:55, Shaozhong SHI wrote: I tried the following: select form from mytable where form ~ '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$' I used ^ and $ to ensure checking of allowed values. However, 'Backyard' was selected. Why is that? Sounds like a candidate for a foreign key relationship. Ray. -- Raymond O'Donnell // Galway // Ireland r...@rodonnell.ie
Re: Robust ways for checking allowed values in a column
On Tue, Jan 25, 2022 at 6:56 AM Shaozhong SHI wrote: > select form from mytable where form ~ > '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$' > You do not need to repeat the boundary metacharacters on each branch. You can assert their presence just once and then use parentheses to group the alternations. form ~ '^(?Canal|Drain|etc...)$' David J.
Re: Robust ways for checking allowed values in a column
Le mar. 25 janv. 2022 à 14:56, Shaozhong SHI a écrit : > I tried the following: > > select form from mytable where form ~ > '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$' > > I used ^ and $ to ensure checking of allowed values. > > However, 'Backyard' was selected. > > Why is that? > > It works for me: # select 'Backyard' ~ '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$'; ┌──┐ │ ?column? │ ├──┤ │ f│ └──┘ (1 row) So you will probably need a complete and reproducible example so that we could test it. -- Guillaume.
Robust ways for checking allowed values in a column
I tried the following: select form from mytable where form ~ '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$' I used ^ and $ to ensure checking of allowed values. However, 'Backyard' was selected. Why is that? Regards, David
GIN index
Hi:I created a new variable-length data type, and now I want to create a GIN index for it. According to the rules of GIN index, I created three functions: extractValue, extractQuery, and compare. I made sure that the return value of the first two functions is the address of the array, but when using the index query, the GIN tuple data obtained by calling PG_GETARG_DATUM in the compare function is incorrect, and it is misplaced! In memory the size of the data header becomes something else, and the position of the first byte is not the header, it becomes the fourth byte. So there is a high probability that the function called is wrong or my return value is wrong when creating the index or the error is somewhere else? regards!