Re: Counting the number of repeated phrases in a column

2022-01-25 Thread Jian He
you may be interested > https://dba.stackexchange.com/q/166762/238839

On Wed, Jan 26, 2022 at 3:03 AM Ivan Panchenko 
wrote:

>
> On 26.01.2022 00:21, benj@laposte.net wrote:
> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
> >> There is a short of a function in the standard Postgres to do the
> >> following:
> >>
> >> It is easy to count the number of occurrence of words, but it is
> >> rather difficult to count the number of occurrence of phrases.
> >>
> >> For instance:
> >>
> >> A cell of value:  'Hello World' means 1 occurrence a phrase.
> >>
> >> A cell of value: 'Hello World World Hello' means no occurrence of any
> >> repeated phrase.
> >>
> >> But, A cell of value: 'Hello World World Hello Hello World' means 2
> >> occurrences of 'Hello World'.
> >>
> >> 'The City of London, London' also has no occurrences of any repeated
> >> phrase.
> >>
> >> Anyone has got such a function to check out the number of occurrence
> >> of any repeated phrases?
> >>
> >> Regards,
> >>
> >> David
> >
> > Don't know if it's exactly what you want, but you can replace all
> > occurence of the phrase in the text by empty string and compute the
> > diff between the initial and the result and next divide by the length
> > of your phrase.
> >
> > Example :
> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like
> > tutu' , 'toto like' phrase)
> > SELECT (char_length(texte) - char_length(replace(texte, phrase, '')))
> > / char_length(phrase) AS nb_occurence
> > FROM x
> >
> This works if the user knows the phrase. As far as I understood, the
> phrase is not known, and user wants to count number of repeats of any
> phrases.
> Of course this can be done with recursive CTE. Split into words,
> generate all phrases (AFAIK requires recursion), then group and count.
>
> But probably in PL/Perl this could be done  more effectively.
>
>
>
>
>


Re: Undetected Deadlock

2022-01-25 Thread Alvaro Herrera
On 2022-Jan-25, Michael Harris wrote:

> We've recently updated our application to PG 14.1, and in the test instance we
> have started to see some alarming undetected deadlocks.

This is indeed suspicious / worrisome / curious.

What version were you using previously?

I reformatted the result sets:

> An example of what we have seen is:
> 
>  locktype | database |  relation  | page | tuple | virtualxid | transactionid 
> | classid | objid | objsubid | virtualtransaction |   pid   |  mode   
> | granted | fastpath |   waitstart   |relation
> --+--++--+---++---+-+---+--++-+-+-+--+---+
>  relation |   529986 | 1842228045 |  |   ||   
> | |   |  | 165/1941408| 2130531 | AccessShareLock 
> | f   | f| 2022-01-19 00:32:32.626152+01 | st.ctr_table_efr_oa
> (1 row)
> 
>  locktype | database |  relation  | page | tuple | virtualxid | transactionid 
> | classid | objid | objsubid | virtualtransaction |   pid   |mode 
> | granted | fastpath | waitstart | relation
> --+--++--+---++---+-+---+--++-+-+-+--+---+
>  relation |   529986 | 1842228045 |  |   ||   
> | |   |  | 75/2193719 | 2128603 | 
> AccessExclusiveLock | t   | f|   | st.ctr_table_efr_oa
> (1 row)
> 
>  locktype | database |  relation  | page | tuple | virtualxid | transactionid 
> | classid | objid | objsubid | virtualtransaction |   pid   |mode 
> | granted | fastpath |   waitstart   | relation
> --+--++--+---++---+-+---+--++-+-+-+--+---+---
>  relation |   529986 | 1842231489 |  |   ||   
> | |   |  | 75/2193719 | 2128603 | 
> AccessExclusiveLock | f   | f| 2022-01-19 00:32:32.924694+01 | 
> st.tpd_oa
> (1 row)
> 
>locktype| database |  relation  | page | tuple |  virtualxid  | 
> transactionid | classid |   objid   | objsubid | virtualtransaction |   pid   
> | mode  | granted | fastpath |   waitstart   
> | relation
> ---+--++--+---+--+---+-+---+--++-+---+-+--+---+---
>  relation  |   529986 | 1842231489 |  |   |  |
>| |   |  | 165/1941408| 2130531 | 
> AccessShareLock   | t   | f|   | 
> st.tpd_oa
> 
> So:
>   pid 2130531 waits for an AccessShareLock on relation 1842228045, blocked by 
> pid 2128603 which holds an AccessExclusiveLock
>   pid 2128603 waits for an AccessExclusiveLock on relation 1842231489, 
> blocked by pid 2130531 which holds an AccessShareLock
> 
> The queries being executed by these backends are:
> 
>pid   |  query_start  | state_change  | 
> wait_event_type | wait_event | state  | query
> -+---+---+-+++-
>  2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19 00:32:32.924413+01 | 
> Lock| relation   | active | DROP TABLE st.tpd_oa_18929
>  2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19 00:32:32.625708+01 | 
> Lock| relation   | active | DELETE FROM st.ctr_table_efr_oa_19010 
> WHERE ropid = 44788868
> (2 rows)

I know of no cases in which we fail to detect a deadlock.  Perhaps you
have indeed hit a bug.

> Note that there were a lot of other processes also waiting on relation
> 1842231489 - could that be confusing the deadlock detection routine?

It shouldn't.

> I am also confused about the locks which are being taken out by the
> DELETE query.

Maybe the lock is already taken before the DELETE is run; do you have
any triggers, rules, constraints, or anything?  If you have seen this
several times already, maybe a way to investigate deeper is an
exhaustive log capture of everything that these transactions do, from
the point they begin until they become blocked (log_statement=all).

Perhaps you need to involve other concurrent transactions in order to
cause the problem.

-- 
Álvaro Herrera   

Re: Counting the number of repeated phrases in a column

2022-01-25 Thread Ivan Panchenko



On 26.01.2022 00:21, benj@laposte.net wrote:

Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
There is a short of a function in the standard Postgres to do the 
following:


It is easy to count the number of occurrence of words, but it is 
rather difficult to count the number of occurrence of phrases.


For instance:

A cell of value:  'Hello World' means 1 occurrence a phrase.

A cell of value: 'Hello World World Hello' means no occurrence of any 
repeated phrase.


But, A cell of value: 'Hello World World Hello Hello World' means 2 
occurrences of 'Hello World'.


'The City of London, London' also has no occurrences of any repeated 
phrase.


Anyone has got such a function to check out the number of occurrence 
of any repeated phrases?


Regards,

David


Don't know if it's exactly what you want, but you can replace all 
occurence of the phrase in the text by empty string and compute the 
diff between the initial and the result and next divide by the length 
of your phrase.


Example :
WITH x AS (SELECT 'toto like tata and toto like titi and toto like 
tutu' , 'toto like' phrase)
SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) 
/ char_length(phrase) AS nb_occurence

FROM x

This works if the user knows the phrase. As far as I understood, the 
phrase is not known, and user wants to count number of repeats of any 
phrases.
Of course this can be done with recursive CTE. Split into words, 
generate all phrases (AFAIK requires recursion), then group and count.


But probably in PL/Perl this could be done  more effectively.






Re: Counting the number of repeated phrases in a column

2022-01-25 Thread benj . dev

Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :

There is a short of a function in the standard Postgres to do the following:

It is easy to count the number of occurrence of words, but it is rather 
difficult to count the number of occurrence of phrases.


For instance:

A cell of value:  'Hello World' means 1 occurrence a phrase.

A cell of value: 'Hello World World Hello' means no occurrence of any 
repeated phrase.


But, A cell of value: 'Hello World World Hello Hello World' means 2 
occurrences of 'Hello World'.


'The City of London, London' also has no occurrences of any repeated phrase.

Anyone has got such a function to check out the number of occurrence of 
any repeated phrases?


Regards,

David


Don't know if it's exactly what you want, but you can replace all 
occurence of the phrase in the text by empty string and compute the diff 
between the initial and the result and next divide by the length of your 
phrase.


Example :
WITH x AS (SELECT 'toto like tata and toto like titi and toto like tutu' 
, 'toto like' phrase)
SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) / 
char_length(phrase) AS nb_occurence

FROM x



OpenPGP_signature
Description: OpenPGP digital signature


NIST 800-53v4 scanning?

2022-01-25 Thread Ben Chobot

Hello pgsql-general,

I've been tasked with scanning our Ubuntu-hosted databases for NIST 
800-53v4 compliance. I'm finding a paucity of tools out there that will 
do this. I found a few that might work, when pointed at Postgres 9 on 
RHEL, but not much else.


Is this a problem anybody else has tackled? If so, how?




Re: Counting the number of repeated phrases in a column

2022-01-25 Thread Laura Smith
‐‐‐ Original Message ‐‐‐

On Tuesday, January 25th, 2022 at 17:10, Shaozhong SHI  
wrote:

> 'The City of London, London' also has no occurrences of any repeated phrase.


Not sure the City would be particularly happy with that attribution. ;-)

Its it sits on its own. Its own local authority, its own county. It is an 
enclave enclosed by Greater London.

A bit like the Vatican really. Except the City isn't its own country - much to 
the chagrin of some, no doubt !




Re: Counting the number of repeated phrases in a column

2022-01-25 Thread David G. Johnston
On Tue, Jan 25, 2022 at 10:10 AM Shaozhong SHI 
wrote:

> Anyone has got such a function to check out the number of occurrence of
> any repeated phrases?
>

Not I.  But I wouldn't be surprised that such an algorithm exists and that
it has been implemented - in a language other than SQL or pl/pgsql.

David J.


Aw: Re: Counting the number of repeated phrases in a column

2022-01-25 Thread Karsten Hilbert
> How about split up the value into individual words and keep their orders?
> add words up to form individual phrase and ensure that each phrase only 
> consists unique/distinct words
> count repeated phrases afterward
>  
> How about this?

Sure, if that serves your purpose ?

So far, we (I?) can't tell because you have yet to (computably) define "phrase".

Which may or may not solve the previous dilemma.

(Top-posting is not liked on this list, to my knowledge.)

Best,
Karsten





Re: Counting the number of repeated phrases in a column

2022-01-25 Thread Shaozhong SHI
How about split up the value into individual words and keep their orders?
add words up to form individual phrase and ensure that each phrase only
consists unique/distinct words
count repeated phrases afterward

How about this?

Regards,

David

On Tue, 25 Jan 2022 at 17:22, Karsten Hilbert 
wrote:

> > There is a short of a function in the standard Postgres to do the
> following:
> >
> > it is easy to count the number of occurrence of words, but it is rather
> difficult to count the number of occurrence of phrases.
> >
> > For instance:
> >
> > A cell of value:  'Hello World' means 1 occurrence a phrase.
> >
> > A cell of value: 'Hello World World Hello' means no occurrence of any
> repeated phrase.
> >
> > But, A cell of value: 'Hello World World Hello Hello World' means 2
> occurrences of 'Hello World'.
> >
> > 'The City of London, London' also has no occurrences of any repeated
> phrase.
> >
> > Anyone has got such a function to check out the number of occurrence of
> any repeated phrases?
>
> For that to become answerable you may want to define what to
> do when facing ambiguity.
>
> Best,
> Karsten
>
>
>


Aw: Counting the number of repeated phrases in a column

2022-01-25 Thread Karsten Hilbert
> There is a short of a function in the standard Postgres to do the following:
>  
> it is easy to count the number of occurrence of words, but it is rather 
> difficult to count the number of occurrence of phrases.
>  
> For instance:
>  
> A cell of value:  'Hello World' means 1 occurrence a phrase.
>  
> A cell of value: 'Hello World World Hello' means no occurrence of any 
> repeated phrase.
>  
> But, A cell of value: 'Hello World World Hello Hello World' means 2 
> occurrences of 'Hello World'.
>  
> 'The City of London, London' also has no occurrences of any repeated phrase.
>  
> Anyone has got such a function to check out the number of occurrence of any 
> repeated phrases?

For that to become answerable you may want to define what to
do when facing ambiguity.

Best,
Karsten




Counting the number of repeated phrases in a column

2022-01-25 Thread Shaozhong SHI
There is a short of a function in the standard Postgres to do the following:

It is easy to count the number of occurrence of words, but it is rather
difficult to count the number of occurrence of phrases.

For instance:

A cell of value:  'Hello World' means 1 occurrence a phrase.

A cell of value: 'Hello World World Hello' means no occurrence of any
repeated phrase.

But, A cell of value: 'Hello World World Hello Hello World' means 2
occurrences of 'Hello World'.

'The City of London, London' also has no occurrences of any repeated phrase.

Anyone has got such a function to check out the number of occurrence of any
repeated phrases?

Regards,

David


Re: Robust ways for checking allowed values in a column

2022-01-25 Thread Rob Sargent

On 1/25/22 09:35, Shaozhong SHI wrote:

How about adding null as an alteration.

Would this be robust?

Regards,

David

On Tue, 25 Jan 2022 at 14:25, David G. Johnston 
 wrote:


On Tue, Jan 25, 2022 at 6:56 AM Shaozhong SHI
 wrote:

 select form from mytable  where form ~

'^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$'


You do not need to repeat the boundary metacharacters on each
branch.  You can assert their presence just once and then use
parentheses to group the alternations.

form ~ '^(?Canal|Drain|etc...)$'

David J.



You would need to add form ~ 'expression' or form is null

And a body of held water is a reservoir (no 'a')

Re: Robust ways for checking allowed values in a column

2022-01-25 Thread Shaozhong SHI
How about adding null as an alteration.

Would this be robust?

Regards,

David

On Tue, 25 Jan 2022 at 14:25, David G. Johnston 
wrote:

> On Tue, Jan 25, 2022 at 6:56 AM Shaozhong SHI 
> wrote:
>
>>  select form from mytable  where form ~
>> '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$'
>>
>
> You do not need to repeat the boundary metacharacters on each branch.  You
> can assert their presence just once and then use parentheses to group the
> alternations.
>
> form ~ '^(?Canal|Drain|etc...)$'
>
> David J.
>
>


Re: tstzrange on large table gives poor estimate of expected rows

2022-01-25 Thread Tom Dearman
Thanks for your help.  It is true we could get rid of it but we still
want to use the functional index on the date range as we understand it
is supposed to be a better look up - we also have other date range
look ups on tables that seem to be degrading.  I have found a solution
to the problem.  The postgres default_statistics_target is 100 and
when we upped it to 1 the estimate was good.  We could not have
set the default to 1 on production but there appeared to be no way
to change the value for the function index as statistics is set per
column.  However, in a post answered by Tom Lane in 2012 he gives a
way to set the value for the statistics target on the functional index
(https://www.postgresql.org/message-id/6668.1351105908%40sss.pgh.pa.us)

Thanks.

On Mon, 24 Jan 2022 at 17:43, Michael Lewis  wrote:
>
> If interval_end_date is always 1 day ahead, why store it at all?
>
> Dependencies on a custom stats object wouldn't do anything I don't think 
> because they are offset. They are 100% correlated, but not in a way that any 
> of the existing stat types capture as far as I can figure.




Re: GIN index

2022-01-25 Thread Julien Rouhaud
Hi,

On Tue, Jan 25, 2022 at 02:42:14AM +, huangning...@yahoo.com wrote:
> Hi:I created a new variable-length data type, and now I want to create a GIN
> index for it. According to the rules of GIN index, I created three functions:
> extractValue, extractQuery, and compare. I made sure that the return value of
> the first two functions is the address of the array, but when using the index
> query, the GIN tuple data obtained by calling PG_GETARG_DATUM in the compare
> function is incorrect, and it is misplaced! In memory the size of the data
> header becomes something else, and the position of the first byte is not the
> header, it becomes the fourth byte. So there is a high probability that the
> function called is wrong or my return value is wrong when creating the index
> or the error is somewhere else?

It's hard to have an opinion without seeing any code extract.  Have you checked
contrib module for examples of other GIN opclasses, like pg_trgrm, or hstore
for varlena datatype with GIN support?

Note also that pgsql-hackers is probably a better mailing list for this kind of
questions.




Re: Robust ways for checking allowed values in a column

2022-01-25 Thread Ray O'Donnell

On 25/01/2022 13:55, Shaozhong SHI wrote:

I tried the following:

select form from mytable  where form ~ 
'^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$'


 I used ^ and $ to ensure checking of allowed values.

However, 'Backyard' was selected.

Why is that?


Sounds like a candidate for a foreign key relationship.

Ray.


--
Raymond O'Donnell // Galway // Ireland
r...@rodonnell.ie




Re: Robust ways for checking allowed values in a column

2022-01-25 Thread David G. Johnston
On Tue, Jan 25, 2022 at 6:56 AM Shaozhong SHI 
wrote:

>  select form from mytable  where form ~
> '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$'
>

You do not need to repeat the boundary metacharacters on each branch.  You
can assert their presence just once and then use parentheses to group the
alternations.

form ~ '^(?Canal|Drain|etc...)$'

David J.


Re: Robust ways for checking allowed values in a column

2022-01-25 Thread Guillaume Lelarge
Le mar. 25 janv. 2022 à 14:56, Shaozhong SHI  a
écrit :

> I tried the following:
>
>  select form from mytable  where form ~
> '^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$'
>
> I used ^ and $ to ensure checking of allowed values.
>
> However, 'Backyard' was selected.
>
> Why is that?
>
>
It works for me:

# select 'Backyard' ~
'^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$';
┌──┐
│ ?column? │
├──┤
│ f│
└──┘
(1 row)

So you will probably need a complete and reproducible example so that we
could test it.


-- 
Guillaume.


Robust ways for checking allowed values in a column

2022-01-25 Thread Shaozhong SHI
I tried the following:

 select form from mytable  where form ~
'^Canal$|^Drain$|^Foreshore$|^inlandRiver$|^Lake$|^lockOrFlightOfLocks$|^Marsh$|^Researvoir$|^Sea$|^tidalRiver$|^Transfer$'

I used ^ and $ to ensure checking of allowed values.

However, 'Backyard' was selected.

Why is that?

Regards,

David


GIN index

2022-01-25 Thread huangning...@yahoo.com
Hi:I created a new variable-length data type, and now I want to create a GIN 
index for it. According to the rules of GIN index, I created three functions: 
extractValue, extractQuery, and compare. I made sure that the return value of 
the first two functions is the address of the array, but when using the index 
query, the GIN tuple data obtained by calling PG_GETARG_DATUM in the compare 
function is incorrect, and it is misplaced! In memory the size of the data 
header becomes something else, and the position of the first byte is not the 
header, it becomes the fourth byte. So there is a high probability that the 
function called is wrong or my return value is wrong when creating the index or 
the error is somewhere else? 

regards!