Re: Broken logical replication

2022-01-26 Thread Игорь Выскорко

Hi experts!
I didn't notice html tags in previous message. Sorry for that. Hope that 
was the only reason of no answers :)


Original message was:
Giving 2 postgres servers:
1. Master -  PostgreSQL 10.16
2. Slave - PostgreSQL 13.5
Logical replication was configured and worked fine between them.
At some point software (russian accounting soft: 1C) which is using 
master DB, decided to do some db-related tasks. I know only about full 
reindexing - all indexes are built again - probably there were more DDL...

Ok, when it's finished I found that replication was broken:
1. Publication object exists but all publication tables were dropped. I 
mean not tables itself but:
select * from pg_publication_tables gave 0 rows. Ok, I've added them 
again - didn't help

2. In log I've found repeated bunch of records:
янв 20 23:33:29 testpg postgres[6291]: [9-1] 2022-01-20 16:33:29.087 UTC 
[6291] ERROR:  replication slot "subscr" is active for PID 3984
янв 20 23:33:34 testpg postgres[6295]: [7-1] 2022-01-20 16:33:34.101 UTC 
[6295] LOG:  connection received: host=192.168.7.225 port=43428
янв 20 23:33:34 testpg postgres[6295]: [8-1] 2022-01-20 16:33:34.102 UTC 
[6295] LOG:  replication connection authorized: user=rep_user
янв 20 23:33:34 testpg postgres[6295]: [9-1] 2022-01-20 16:33:34.104 UTC 
[6295] ERROR:  replication slot "subscr" is active for PID 3984
янв 20 23:33:39 testpg postgres[6298]: [7-1] 2022-01-20 16:33:39.117 UTC 
[6298] LOG:  connection received: host=192.168.7.225 port=43470
янв 20 23:33:39 testpg postgres[6298]: [8-1] 2022-01-20 16:33:39.118 UTC 
[6298] LOG:  replication connection authorized: user=rep_user
янв 20 23:33:39 testpg postgres[6298]: [9-1] 2022-01-20 16:33:39.537 UTC 
[6298] LOG:  starting logical decoding for slot "subscr"
янв 20 23:33:39 testpg postgres[6298]: [9-2] 2022-01-20 16:33:39.537 UTC 
[6298] DETAIL:  streaming transactions committing after 1E0/3449C020, 
reading WAL from 1D9/EEAD19E8
янв 20 23:33:39 testpg postgres[6298]: [10-1] 2022-01-20 16:33:39.538 
UTC [6298] LOG:  logical decoding found consistent point at 1D9/EEAD19E8
янв 20 23:33:39 testpg postgres[6298]: [10-2] 2022-01-20 16:33:39.538 
UTC [6298] DETAIL:  There are no running transactions.

3. Replication status:
db=# select * from pg_stat_replication ;
-[ RECORD 1 ]+--
pid              | 6298
usesysid         | 16384
usename          | rep_user
application_name | subscr
client_addr      | 192.168.7.225
client_hostname  |
client_port      | 43470
backend_start    | 2022-01-20 16:33:39.117019+00
backend_xmin     |
state            | catchup
sent_lsn         | 1DE/E849BFD8
write_lsn        | 1E0/3449C020
flush_lsn        | 1E0/3449C020
replay_lsn       | 1E0/3449C020
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
sent_lsn is lower than write_lsn, what? Is it legal?)
4. pg_wal directory was bloated - it's ok I suppose because of opened 
replication slot.
So, my question here: is it possible to fix replication without full 
restart (truncating tables in slave and copy all data)?





Re: could not open relation with OID

2022-01-26 Thread Michael Paquier
On Wed, Jan 26, 2022 at 05:30:01PM -0800, Ben Chobot wrote:
> We do a lot of queries per day, over a lot of hosts, all of which are on
> 12.9. We've recently started doing a better job at analyzing our db logs and
> have found that, a few times a day, every day, we see some of our queries
> fail with errors like:
> 
> could not open relation with OID 201940279
> 
> In the cases we've examined so far, the failed query succeeds just fine when
> we run it manually. The failed query also had run on an async streaming
> replica, and the primary has completed at least one autovacuum since the
> failure. I don't know if either of those two facts are relevant, but I'm not
> sure what else to blame. The internet seems to want to blame issues like
> this on temp tables, which makes sense, but in our case, most of the queries
> that are failing this way are simple PK scans, which then fall back to the
> table to pull all the columns. The tables themselves are small in row count
> - although some values are likely TOASTed - so I would be surprised if
> anything is spilling to disk for sorting, which might have counted as a temp
> table enough to give such an error.

Do those OIDs point to some specific relations?  It should be easy
enough to guess to which pg_class entry they point to, especially if
you have a persistent schema, and it these are indeed temporary
entries or not depending on their pg_class.relnamespace.

> This is a minuscule failure percentage, so replicating it is going to be
> hard, but it is still breaking for reasons I don't understand, and so I'd
> like to fix it. Has anybody else seen this, or have an ideas of what to look
> at?

I don't recall seeing such reports recently.

> Other things we've considered:
>     - we run pg_repack, which certainly seems like it could make an error
> like this, but we see this error in places and times that pg_repack isn't
> currently running

It could also take time for the issue to show up, depending on the
state of the relcache.
--
Michael


signature.asc
Description: PGP signature


Re: PostgreSQL Management and monitoring tool

2022-01-26 Thread Mladen Gogala

On 1/26/22 22:24, Lucas wrote:


Thanks… But I was just hoping that someone would share their solution 
in a more depth way…


For example: Having Zabbix to monitor the database is nice, but it 
does not allow you to manage Replication, backups and more from the 
web, like awide.io  does. Maybe there isn’t even a 
solution like that open source….


You can try with Oracle Enterprise Manager. Basic installation is free 
of charge and so is Postgres plugin. Of course, you'll need a fairly 
large box housing Oracle 19c EE and Weblogic application server, both 
parts of the OEM. Also, you don't know what is an adventure until you 
try upgrading OEM. It's things like that that make life interesting. As 
for the backups, my advice is to purchase Commvault, it has a nice GUI, 
can do storage snapshot backups and has great support.


Disclosure:

I used to work for Commvault Systems until May 2019, for 7 years.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: PostgreSQL Management and monitoring tool

2022-01-26 Thread Lucas

> On 27/01/2022, at 5:01 PM, Rob Sargent  wrote:
> 
> On 1/26/22 20:24, Lucas wrote:
>> 
>>> On 27/01/2022, at 3:00 PM, Bruce Momjian >> > wrote:
>>> 
>>> On Thu, Jan 27, 2022 at 02:28:17PM +1300, Lucas wrote:
 Hi guys.
 
 I’m migrating a few databases to PG 14 and was wondering that it would be 
 very
 nice to have a tool to help me monitor and manage my databases. I found 
 Awide
 and it looks really great, but it would cost me around $800 monthly.
 
 Do you guys know by any chance a similar solution cheaper or even open 
 source?
>>> 
>>> Uh, this should help:
>>> 
>>> https://wiki.postgresql.org/wiki/Monitoring 
>>> 
>> 
>> Thanks… But I was just hoping that someone would share their solution in a 
>> more depth way…
>> 
>> For example: Having Zabbix to monitor the database is nice, but it does not 
>> allow you to manage Replication, backups and more from the web, like 
>> awide.io  does. Maybe there isn’t even a solution like 
>> that open source…. 
>> 
> I wonder what $800/month gets you from say PostgreSQL Experts et al. 
>   
> Perhaps at some point your left with the pieces you need and no more 
> $10K/year.

I’m not looking for a 3rd party consultancy company to monitor my database or 
to do any DBA work… I’m simply asking if anybody knows of any PostgreSQL 
monitoring tool similar to awide.io that is open source? That you have used in 
the past and could share your thoughts, pros and cons?

I know https://wiki.postgresql.org/wiki/Monitoring 
 provides a list of monitoring 
tools, but I want to hear from you guys. What you use, why and if you would do 
it differently if you had the chance.

Re: PostgreSQL Management and monitoring tool

2022-01-26 Thread Rob Sargent

On 1/26/22 20:24, Lucas wrote:



On 27/01/2022, at 3:00 PM, Bruce Momjian  wrote:

On Thu, Jan 27, 2022 at 02:28:17PM +1300, Lucas wrote:

Hi guys.

I’m migrating a few databases to PG 14 and was wondering that it 
would be very
nice to have a tool to help me monitor and manage my databases. I 
found Awide

and it looks really great, but it would cost me around $800 monthly.

Do you guys know by any chance a similar solution cheaper or even 
open source?


Uh, this should help:

https://wiki.postgresql.org/wiki/Monitoring


Thanks… But I was just hoping that someone would share their solution 
in a more depth way…


For example: Having Zabbix to monitor the database is nice, but it 
does not allow you to manage Replication, backups and more from the 
web, like awide.io  does. Maybe there isn’t even a 
solution like that open source….


I wonder what $800/month gets you from say PostgreSQL Experts et al. 
  
Perhaps at some point your left with the pieces you need and no more 
$10K/year.

Re: PostgreSQL Management and monitoring tool

2022-01-26 Thread Lucas

> On 27/01/2022, at 3:00 PM, Bruce Momjian  wrote:
> 
> On Thu, Jan 27, 2022 at 02:28:17PM +1300, Lucas wrote:
>> Hi guys.
>> 
>> I’m migrating a few databases to PG 14 and was wondering that it would be 
>> very
>> nice to have a tool to help me monitor and manage my databases. I found Awide
>> and it looks really great, but it would cost me around $800 monthly.
>> 
>> Do you guys know by any chance a similar solution cheaper or even open 
>> source?
> 
> Uh, this should help:
> 
>   https://wiki.postgresql.org/wiki/Monitoring

Thanks… But I was just hoping that someone would share their solution in a more 
depth way…

For example: Having Zabbix to monitor the database is nice, but it does not 
allow you to manage Replication, backups and more from the web, like awide.io 
 does. Maybe there isn’t even a solution like that open 
source…. 

> 
> -- 
>  Bruce Momjian  https://momjian.us
>  EDB  https://enterprisedb.com
> 
>  If only the physical world exists, free will is an illusion.



Re: Undetected Deadlock

2022-01-26 Thread Michael Lewis
There may be a bug so perhaps still pursue reproducing the issue, but I
would expect drop concurrently to resolve your issue with the two processes
conflicting. Also, perhaps trying"insert, on conflict do update" could be
more efficient than the copy but obviously there are too many unknowns and
variables for either to be a definite winner.


Re: Undetected Deadlock

2022-01-26 Thread Michael Harris
> I must be missing something. You mentioned dropping a partition, so is there 
> an actual need for the delete? Could you detach concurrently and then drop 
> the table or delete rows if needed?

The DELETE is part of a transaction performing data loading. Our
application allows data to be overwritten. Before using a COPY to
actually store the new data, we do a DELETE to make sure there are no
existing records. We tested a number of sequences and found that the
most efficient is to perform a DELETE first, even though in most cases
there is nothing to delete, and then do the COPY (rather than for
example checking if it exists first and then deleting it).

The DROP is part of a separate transaction that is doing the data
expiry. Note that it is not dropping the same partition; old
partitions get dropped, new data is loaded into newer partitions.

Cheers
Mike




Re: PostgreSQL Management and monitoring tool

2022-01-26 Thread Bruce Momjian
On Thu, Jan 27, 2022 at 02:28:17PM +1300, Lucas wrote:
> Hi guys.
> 
> I’m migrating a few databases to PG 14 and was wondering that it would be very
> nice to have a tool to help me monitor and manage my databases. I found Awide
>  and it looks really great, but it would cost me around $800 monthly.
> 
> Do you guys know by any chance a similar solution cheaper or even open source?

Uh, this should help:

https://wiki.postgresql.org/wiki/Monitoring

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





could not open relation with OID

2022-01-26 Thread Ben Chobot
We do a lot of queries per day, over a lot of hosts, all of which are on 
12.9. We've recently started doing a better job at analyzing our db logs 
and have found that, a few times a day, every day, we see some of our 
queries fail with errors like:


could not open relation with OID 201940279

In the cases we've examined so far, the failed query succeeds just fine 
when we run it manually. The failed query also had run on an async 
streaming replica, and the primary has completed at least one autovacuum 
since the failure. I don't know if either of those two facts are 
relevant, but I'm not sure what else to blame. The internet seems to 
want to blame issues like this on temp tables, which makes sense, but in 
our case, most of the queries that are failing this way are simple PK 
scans, which then fall back to the table to pull all the columns. The 
tables themselves are small in row count - although some values are 
likely TOASTed - so I would be surprised if anything is spilling to disk 
for sorting, which might have counted as a temp table enough to give 
such an error.


This is a minuscule failure percentage, so replicating it is going to be 
hard, but it is still breaking for reasons I don't understand, and so 
I'd like to fix it. Has anybody else seen this, or have an ideas of what 
to look at?


Other things we've considered:
    - we run pg_repack, which certainly seems like it could make an 
error like this, but we see this error in places and times that 
pg_repack isn't currently running
    - although all our servers are currently on 12.9, I don't think 
this is a new error for us. I believe we might have seen it on previous 
minor versions of 12 and probably on 9.5 as well.
    - our filesystem is xfs and seems reliable. I would expect that if 
it was a filesystem level error, it would not be so transient. We do, 
occasionally, expand our filesystems, but not at all the times we've 
seen this error.




PostgreSQL Management and monitoring tool

2022-01-26 Thread Lucas
Hi guys.

I’m migrating a few databases to PG 14 and was wondering that it would be very 
nice to have a tool to help me monitor and manage my databases. I found Awide 
 and it looks really great, but it would cost me around $800 
monthly.

Do you guys know by any chance a similar solution cheaper or even open source?

Thanks!
Lucas

Re: Undetected Deadlock

2022-01-26 Thread Michael Lewis
I must be missing something. You mentioned dropping a partition, so is
there an actual need for the delete? Could you detach concurrently and then
drop the table or delete rows if needed?

https://www.postgresql.org/docs/14/sql-altertable.html#SQL-ALTERTABLE-DETACH-PARTITION


Re: Undetected Deadlock

2022-01-26 Thread Michael Harris
Hi Alvaro

Thanks for the feedback!

> What version were you using previously?

We were previously on 11.4. Another difference is that we were using
inheritance based partitioning before, whereas now we are using
declarative partitioning.

> Maybe the lock is already taken before the DELETE is run; do you have
> any triggers, rules, constraints, or anything?

There are no triggers, rules or constraints on the table involved in
the DELETE (either the partition or the table that the partition is
part of).

Even more confusingly - when I reproduce the SQL statements that
should be triggering the deadlock, it does not happen: the DELETE does
not attempt to take an AccessShareLock on the parent table, so it does
not deadlock.

Is there any state associated with a transaction or a database
connection that would alter the lock(s) that gets taken out for a
DELETE on a partition? Or can other concurrent transactions in other
processes cause more locks to be needed somehow?

> If you have seen this several times already, maybe a way to investigate 
> deeper is an
> exhaustive log capture of everything that these transactions do

So far it has happened at least twice. There were a couple of other
incidents that may well also have been caused by this, but not enough
data was collected at the time to be sure.

A bit more detail: the two processes that are deadlocked here are one
that is ingesting new data, while the other is removing old data by
dropping partitions. Even before we shifted to 14.1 and native
partitioning, we did get deadlocks between these two processes every
so often which we could not really prevent, so we adopted a retry
approach when it does occur. However we never had an undetected
deadlock in the database.

Since going to 14.1 & native partitioning, we are seeing a lot more
frequent deadlocks. It looks like the increase in frequency might be
related to the extra lock taken by the DELETE that I mentioned above.
However most of the time the deadlock is detected and released by
Postgres and the impact is minimal. Of course it is another story if
it is not detected!

I have enabled `log_statement=all`, but the undetected deadlock hasn't
happened again since. I can easily reproduce the deadlock itself, but
not the undetected case.

Thanks again.

Cheers
Mike

On Wed, 26 Jan 2022 at 10:11, Alvaro Herrera  wrote:
>
> On 2022-Jan-25, Michael Harris wrote:
>
> > We've recently updated our application to PG 14.1, and in the test instance 
> > we
> > have started to see some alarming undetected deadlocks.
>
> This is indeed suspicious / worrisome / curious.
>
> What version were you using previously?
>
> I reformatted the result sets:
>
> > An example of what we have seen is:
> >
> >  locktype | database |  relation  | page | tuple | virtualxid | 
> > transactionid | classid | objid | objsubid | virtualtransaction |   pid   | 
> >  mode   | granted | fastpath |   waitstart   |  
> >   relation
> > --+--++--+---++---+-+---+--++-+-+-+--+---+
> >  relation |   529986 | 1842228045 |  |   || 
> >   | |   |  | 165/1941408| 2130531 | 
> > AccessShareLock | f   | f| 2022-01-19 00:32:32.626152+01 | 
> > st.ctr_table_efr_oa
> > (1 row)
> >
> >  locktype | database |  relation  | page | tuple | virtualxid | 
> > transactionid | classid | objid | objsubid | virtualtransaction |   pid   | 
> >mode | granted | fastpath | waitstart | relation
> > --+--++--+---++---+-+---+--++-+-+-+--+---+
> >  relation |   529986 | 1842228045 |  |   || 
> >   | |   |  | 75/2193719 | 2128603 | 
> > AccessExclusiveLock | t   | f|   | st.ctr_table_efr_oa
> > (1 row)
> >
> >  locktype | database |  relation  | page | tuple | virtualxid | 
> > transactionid | classid | objid | objsubid | virtualtransaction |   pid   | 
> >mode | granted | fastpath |   waitstart   | 
> > relation
> > --+--++--+---++---+-+---+--++-+-+-+--+---+---
> >  relation |   529986 | 1842231489 |  |   || 
> >   | |   |  | 75/2193719 | 2128603 | 
> > AccessExclusiveLock | f   | f| 2022-01-19 00:32:32.924694+01 | 
> > st.tpd_oa
> > (1 row)
> >
> >locktype| database |  relation  | page | tuple |  virtualxid  | 
> > 

Re: Counting the number of repeated phrases in a column

2022-01-26 Thread Merlin Moncure
On Tue, Jan 25, 2022 at 11:10 AM Shaozhong SHI  wrote:
>
> There is a short of a function in the standard Postgres to do the following:
>
> It is easy to count the number of occurrence of words, but it is rather 
> difficult to count the number of occurrence of phrases.
>
> For instance:
>
> A cell of value:  'Hello World' means 1 occurrence a phrase.
>
> A cell of value: 'Hello World World Hello' means no occurrence of any 
> repeated phrase.
>
> But, A cell of value: 'Hello World World Hello Hello World' means 2 
> occurrences of 'Hello World'.
>
> 'The City of London, London' also has no occurrences of any repeated phrase.
>
> Anyone has got such a function to check out the number of occurrence of any 
> repeated phrases?

Let's define phase as a sequence of two or more words, delimited by
space.  you could find it with something like:

with s as (select 'Hello World Hello World' as sentence)
select
  phrase,
  array_upper(string_to_array((select sentence from s), phrase), 1) -
1 as occurrances
from
(
  select array_to_string(x, ' ') as phrase
  from
  (
select distinct v[a:b]  x
from regexp_split_to_array((select sentence from s), ' ') v
cross join lateral generate_series(1, array_upper(v, 1)) a
cross join lateral generate_series(a + 1, array_upper(v, 1)) b
  ) q
) q;

this would be slow for large sentences obviously, and you'd probably
want to prepare the string stripping some characters and such.

merlin




Re: Counting the number of repeated phrases in a column

2022-01-26 Thread Karsten Hilbert
Am Wed, Jan 26, 2022 at 08:35:06PM + schrieb Shaozhong SHI:

> Whatever.   Can we try to build a regex for   'The City of London London
> Great London UK ' ?

Would you be so kind as do be more specific about that "we" ?

Best,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: Counting the number of repeated phrases in a column

2022-01-26 Thread Rob Sargent

On 1/26/22 13:35, Shaozhong SHI wrote:



On Tue, 25 Jan 2022 at 17:10, Shaozhong SHI  
wrote:


There is a short of a function in the standard Postgres to do the
following:

It is easy to count the number of occurrence of words, but it is
rather difficult to count the number of occurrence of phrases.

For instance:

A cell of value:  'Hello World' means 1 occurrence a phrase.

A cell of value: 'Hello World World Hello' means no occurrence of
any repeated phrase.

But, A cell of value: 'Hello World World Hello Hello World' means
2 occurrences of 'Hello World'.

'The City of London, London' also has no occurrences of any
repeated phrase.

Anyone has got such a function to check out the number of
occurrence of any repeated phrases?

Regards,

David


Hi, All Friends,

Whatever.   Can we try to build a regex for   'The City of London 
London Great London UK ' ?


It could be something like '[\w\s]+[\s-]+[a-z]+[\s-][\s\w]+'. 
 [\s-]+[a-z]+[\s-] is catered for some people think that 'City of 
London' is 'City-of-London' or 'City-of-London'.


Regards,

David
Do you really want "The City of", by itself, to be one of the detected 
phrases?  eg 'The City of London London Great London UK The City of 
Liverpool'.

Counting the number of repeated phrases in a column

2022-01-26 Thread David G. Johnston
On Wednesday, January 26, 2022, Shaozhong SHI 
wrote:

>
>
> Whatever.   Can we try to build a regex for   'The City of London London
> Great London UK ' ?
>

Not even if you paid me.  I’d probably die before I succeeded if you
tortured me.

David J.


Re: Counting the number of repeated phrases in a column

2022-01-26 Thread Shaozhong SHI
On Tue, 25 Jan 2022 at 17:10, Shaozhong SHI  wrote:

> There is a short of a function in the standard Postgres to do the
> following:
>
> It is easy to count the number of occurrence of words, but it is rather
> difficult to count the number of occurrence of phrases.
>
> For instance:
>
> A cell of value:  'Hello World' means 1 occurrence a phrase.
>
> A cell of value: 'Hello World World Hello' means no occurrence of any
> repeated phrase.
>
> But, A cell of value: 'Hello World World Hello Hello World' means 2
> occurrences of 'Hello World'.
>
> 'The City of London, London' also has no occurrences of any repeated
> phrase.
>
> Anyone has got such a function to check out the number of occurrence of
> any repeated phrases?
>
> Regards,
>
> David
>

Hi, All Friends,

Whatever.   Can we try to build a regex for   'The City of London London
Great London UK ' ?

It could be something like '[\w\s]+[\s-]+[a-z]+[\s-][\s\w]+'.
 [\s-]+[a-z]+[\s-] is catered for some people think that 'City of London'
is 'City-of-London' or 'City-of-London'.

Regards,

David


Could not serialize access due to concurrent update

2022-01-26 Thread Mladen Gogala
In this post, I am not asking a question, I am sharing an experience. 
The application is running on Linux, PostgreSQL 13.5. using Websphere 9 
application server. When using "SKIP LOCKED" option, I suddenly started 
seeing errors like "Could not serialize access due to concurrent 
update". After some reading, the problem was pinpointed to the 
transaction isolation level:


https://pganalyze.com/docs/log-insights/app-errors/U138

OK, the app is not setting transaction isolation level to repeatable 
read, so what's going on? The documentation for Websphere reveals the truth:


https://www.ibm.com/support/pages/transaction-isolation-levels-and-websphere-application-server

*If you do not specify the isolation level*

The product does not require you to set the isolation level on a data 
source resource reference for a non-CMP application module. If you do 
not specify isolation level on the resource reference, or if you specify 
TRANSACTION_NONE, the WebSphere Application Server run time uses a 
default isolation level for the data source. Application Server uses a 
default setting based on the JDBC driver.


*For most drivers, WebSphere Application Server uses 
an isolation level default of TRANSACTION_REPEATABLE_READ. *(Bold font 
is my addition)


Fortunately, the same document explains how to set the transaction 
isolation level to READ COMMITTED for the Websphere data source. No 
wonder that IBM stands for "It's Better Manually". Be vewy, vewy 
cawefull when using Websphere and PostgreSQL. Here is how to deal with 
the problem:



*Possible values*   *JDBC isolation level*  *DB2 isolation level*
8   TRANSACTION_SERIALIZABLERepeatable Read (RR)
4 (default) TRANSACTION_REPEATABLE_READ Read Stability (RS)
2   TRANSACTION_READ_COMMITTED  Cursor Stability (CS)
1   TRANSACTION_READ_UNCOMMITTEDUncommitted Read (UR)
0   TRANSACTION_NONENo Commit (NC)



*Note**:* If TRANSACTION_NONE is used, the DB file does not have to be 
journaled.


To define this custom property for a data source, you should do the 
following:

1.  Click *Resources* > *JDBC provider* > */JDBC_provider/*.
2.  Click *Data sources* in the Additional Properties section.
3.  Click the name of the data source.
4.  Click *Custom properties*.
5.  Create the webSphereDefaultIsolationLevel custom property
a. Click *New*.
b. Enter *webSphereDefaultIsolationLevel* for the name field.
c. Enter one of the *"possible values*" in the value field from the 
table above.**i.e. 0, 1, 2, 4, or 8



The value that needs to be entered is 2.
**

*
*

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Matthias Apitz
El día miércoles, enero 26, 2022 a las 11:21:12p. m. +0800, Julien Rouhaud 
escribió:

> Hi,
> 
> On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz  wrote:
> >
> > We changed two relevant Indexes to
> >
> > CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
> > CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );
> 
> When you said changed, did you drop the previous ones?  

Yes, of course.

> As Tom
> mentioned, those indexes are specialized and are only useful for LIKE
> 'something%' queries.  It's quite likely that your existing indexes
> were useful for other queries, which may not be as fast without those
> indexes.  You can check in pg_stat_user_indexes if your indexes seems
> to be used before actually dropping them for instance:
> https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW

Thanks, we're still investigating more cases with LIKE clause and will
consider your hint.

matthias


-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Julien Rouhaud
Hi,

On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz  wrote:
>
> We changed two relevant Indexes to
>
> CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
> CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );

When you said changed, did you drop the previous ones?  As Tom
mentioned, those indexes are specialized and are only useful for LIKE
'something%' queries.  It's quite likely that your existing indexes
were useful for other queries, which may not be as fast without those
indexes.  You can check in pg_stat_user_indexes if your indexes seems
to be used before actually dropping them for instance:
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Matthias Apitz


We changed two relevant Indexes to

CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );

and now the same queries are fast. We're looking through our code for
more such LIKE clauses on VCHAR columns.

Thanks for all the hints

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Tom Lane
Julien Rouhaud  writes:
> On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote:
>> On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud  wrote:
>>> That index can't be used with a LIKE that has a trailing wildcard.

>> So what makes you say that? --DD

> This part of the documentation you mentioned:

>> However, if your database does not use the C locale you will need to create
>> the index with a special operator class to support indexing of
>> pattern-matching queries; see Section 11.10 below.

Note that declaring the index with C collation should also work,
and might be preferable to using the pattern_ops opclass.
C collation has at least some chance of being used explicitly
in queries, whereas a pattern_ops index is basically never going
to match anything but LIKE/regex searches.

regards, tom lane




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Imre Samu
> We face in a PostgreSQL 11.4 installation on a potent Linux host a
> ...
> Why is this (ignoring the Index) and what could be done?

IMHO: 11.4 is very old.  ( Release date: 2019-06-20 ) and missing a lot of
patches.
The latest patch release is 11.14  ( see
https://www.postgresql.org/docs/11/release.html )

You can test the PG11.14 with the PostgreSQL docker image (
https://hub.docker.com/_/postgres )
- docker pull postgres:11.14-bullseye
- import table ( d01buch )
- create indexes
- test your query

Regards,
 Imre

Matthias Apitz  ezt írta (időpont: 2022. jan. 26., Sze,
11:55):

>
> Hello,
>
> We face in a PostgreSQL 11.4 installation on a potent Linux host a
> serious performance degree.
>
> A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
> column with an Index ignores this and does a full table scan:
>
> sisis=# explain (analyze, buffers) select * from d01buch where d01ort like
> 'Z 9610%' ;
> QUERY PLAN
>
> ---
> Gather (cost=1000.00..680262.71 rows=510 width=952) (actual
> time=1324.096..1349.429 rows=1 loops=1)
> Workers Planned: 4
> Workers Launched: 4
> Buffers: shared hit=102040 read=560674
> -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952)
> (actual time=1117.663..1315.062 rows=0 loops=5)
> Filter: (d01ort ~~ 'Z 9610%'::text)
> Rows Removed by Filter: 1055853
> Buffers: shared hit=102040 read=560674
> Planning Time: 2.028 ms
> Execution Time: 1349.593 ms
> (10 Zeilen)
>
> Why is this (ignoring the Index) and what could be done?
>
> Thanks
>
> matthias
>
>
> --
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/
> +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
>
>
>


Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Julien Rouhaud
Hi,

On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote:
> On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud  wrote:
> > > > > Why is this (ignoring the Index) and what could be done?
> > > [...]
> > > create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT*/
> >
> > That index can't be used with a LIKE that has a trailing wildcard.
> 
> Really? That seems to contradict the doc, i.e. default index type is B-Tree,
> which definitely supports trailing wildcard LIKE-predicates, as
> explicitly stated in said doc:
> https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE
> 
> So what makes you say that? --DD

This part of the documentation you mentioned:

> However, if your database does not use the C locale you will need to create
> the index with a special operator class to support indexing of
> pattern-matching queries; see Section 11.10 below.
> 
> PS: That also contradicts the small demo I made earlier up-thread:

relname datatype is name, which has a default C collation, so you are in the
only case that natively work for btree indexes:

# select unnest(indcollation)::regcollation from pg_index where indexrelid = 
'pg_class_relname_nsp_index'::regclass;
 unnest

 "C"
 -
(2 rows)

I highly doubt that OP tables are also using C collation, so almost no one
does that.




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Dominique Devienne
On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud  wrote:
> > > > Why is this (ignoring the Index) and what could be done?
> > [...]
> > create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT*/
>
> That index can't be used with a LIKE that has a trailing wildcard.

Really? That seems to contradict the doc, i.e. default index type is B-Tree,
which definitely supports trailing wildcard LIKE-predicates, as
explicitly stated in said doc:
https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE

So what makes you say that? --DD

PS: That also contradicts the small demo I made earlier up-thread:
```
ddevienne=> \d+ pg_class
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace) <
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
```
Whether the index is UNIQUE or not does not matter in this case.




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Julien Rouhaud
Hi,

On Wed, Jan 26, 2022 at 12:39:25PM +0100, Matthias Apitz wrote:
> 
> > > sisis=# explain (analyze, buffers) select * from d01buch where d01ort 
> > > like 'Z 9610%' ;
> > > QUERY PLAN
> > > ---
> > > Gather (cost=1000.00..680262.71 rows=510 width=952) (actual 
> > > time=1324.096..1349.429 rows=1 loops=1)
> > > Workers Planned: 4
> > > Workers Launched: 4
> > > Buffers: shared hit=102040 read=560674
> > > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) 
> > > (actual time=1117.663..1315.062 rows=0 loops=5)
> > > Filter: (d01ort ~~ 'Z 9610%'::text)
> > > Rows Removed by Filter: 1055853
> > > Buffers: shared hit=102040 read=560674
> > > Planning Time: 2.028 ms
> > > Execution Time: 1349.593 ms
> > > (10 Zeilen)
> > >
> > > Why is this (ignoring the Index) and what could be done?
> [...]
> create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT*/

That index can't be used with a LIKE that has a trailing wildcard.  You need to
either create an index with text_pattern_ops operator class (see
https://www.postgresql.org/docs/current/indexes-opclass.html), or a GIN index
using pg_trgm (which will also work with non-trailing wildcards), see
https://www.postgresql.org/docs/current/pgtrgm.html.




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Matthias Apitz
El día miércoles, enero 26, 2022 a las 12:20:08 +0100, Josef Šimánek escribió:

> st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz  napsal:
> >
> >
> > Hello,
> >
> > We face in a PostgreSQL 11.4 installation on a potent Linux host a
> > serious performance degree.
> >
> > A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
> > column with an Index ignores this and does a full table scan:
> 
> Which index is ignored? Can you share the CREATE INDEX command as well?


/* #   $Revision: 1.1.2.21 $ */
create UNIQUE INDEX d01sig on d01buch(d01gsi,d01ex)  ;
/* alter table d01buch add constraint d01sig unique (d01gsi,d01ex) 
deferrable initially deferred;  *//*  D01SIG   */
create INDEX d01mcopyno on d01buch(d01mcopyno)  ;/*  D01MCOPYNO  */
create INDEX d01bnr on d01buch(d01bnr)  ;/*  D01BNR   */
create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT*/
create INDEX d01mcopynozweig on d01buch(d01mcopyno,d01zweig)  ;/*  D01KATZWEIG  
 */
create INDEX d01ort2 on d01buch(d01ort2)  ;/*  D02ORT2  */
create INDEX d01aufnahme on d01buch(d01aufnahme)  ;/*  D01aufnahme  */
create INDEX d01titlecatkey on d01buch(d01titlecatkey)  ;/*  D01TITLECATKEY  */
create INDEX d01invkrnr on d01buch(d01invkreis,d01invnr)  ;/*  D01invkrnr  */

matthias


> > sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 
> > 'Z 9610%' ;
> > QUERY PLAN
> > ---
> > Gather (cost=1000.00..680262.71 rows=510 width=952) (actual 
> > time=1324.096..1349.429 rows=1 loops=1)
> > Workers Planned: 4
> > Workers Launched: 4
> > Buffers: shared hit=102040 read=560674
> > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) 
> > (actual time=1117.663..1315.062 rows=0 loops=5)
> > Filter: (d01ort ~~ 'Z 9610%'::text)
> > Rows Removed by Filter: 1055853
> > Buffers: shared hit=102040 read=560674
> > Planning Time: 2.028 ms
> > Execution Time: 1349.593 ms
> > (10 Zeilen)
> >
> > Why is this (ignoring the Index) and what could be done?
> >
> > Thanks
> >
> > matthias
> >
> >
> > --
> > Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> > Public GnuPG key: http://www.unixarea.de/key.pub
> >
> >

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Dominique Devienne
On Wed, Jan 26, 2022 at 11:55 AM Matthias Apitz  wrote:
> A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
> column with an Index ignores this and does a full table scan:
>
> sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 
> 9610%' ;
> -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952)...
> Filter: (d01ort ~~ 'Z 9610%'::text)

Could it be a collation issue? They need to match between the query
and the index typically.

See below how an index-scan transforms into a sequential-scan just
from using ILIKE instead of LIKE.
Just to illustrate how collations affect plans for prefix-like
queries. Show the relevant DDL for the `d01ort` column,
and its indexes, and that should help diagnose this.

Perhaps it could be related to statistics too? --DD

```
ddevienne=> explain select count(*) from pg_class where relname like 'PNS%';
   ->  Index Only Scan using pg_class_relname_nsp_index on pg_class
(cost=0.41..8.44 rows=5 width=0)
 Index Cond: ((relname >= 'PNS'::text) AND (relname < 'PNT'::text))
 Filter: (relname ~~ 'PNS%'::text)
Time: 1.647 ms

ddevienne=> explain select count(*) from pg_class where relname ilike 'PNS%';
 Aggregate  (cost=2682.35..2682.36 rows=1 width=8)
   ->  Seq Scan on pg_class  (cost=0.00..2682.34 rows=5 width=0)
 Filter: (relname ~~* 'PNS%'::text)
Time: 1.262 ms
```




Re: SELECT with LIKE clause makes full table scan

2022-01-26 Thread Josef Šimánek
st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz  napsal:
>
>
> Hello,
>
> We face in a PostgreSQL 11.4 installation on a potent Linux host a
> serious performance degree.
>
> A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
> column with an Index ignores this and does a full table scan:

Which index is ignored? Can you share the CREATE INDEX command as well?

> sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 
> 9610%' ;
> QUERY PLAN
> ---
> Gather (cost=1000.00..680262.71 rows=510 width=952) (actual 
> time=1324.096..1349.429 rows=1 loops=1)
> Workers Planned: 4
> Workers Launched: 4
> Buffers: shared hit=102040 read=560674
> -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) 
> (actual time=1117.663..1315.062 rows=0 loops=5)
> Filter: (d01ort ~~ 'Z 9610%'::text)
> Rows Removed by Filter: 1055853
> Buffers: shared hit=102040 read=560674
> Planning Time: 2.028 ms
> Execution Time: 1349.593 ms
> (10 Zeilen)
>
> Why is this (ignoring the Index) and what could be done?
>
> Thanks
>
> matthias
>
>
> --
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
>
>




SELECT with LIKE clause makes full table scan

2022-01-26 Thread Matthias Apitz


Hello,

We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 
9610%' ;
QUERY PLAN
---
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual 
time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) 
(actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

Thanks

matthias


-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: Counting the number of repeated phrases in a column

2022-01-26 Thread Ivan E. Panchenko


On 26.01.2022 11:11, Shaozhong SHI wrote:



On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko 
mailto:i.panche...@postgrespro.ru>> wrote:



On 26.01.2022 00:21, benj@laposte.net
 wrote:
> Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
>> There is a short of a function in the standard Postgres to do the
>> following:
>>
>> It is easy to count the number of occurrence of words, but it is
>> rather difficult to count the number of occurrence of phrases.
>>
>> For instance:
>>
>> A cell of value:  'Hello World' means 1 occurrence a phrase.
>>
>> A cell of value: 'Hello World World Hello' means no occurrence
of any
>> repeated phrase.
>>
>> But, A cell of value: 'Hello World World Hello Hello World'
means 2
>> occurrences of 'Hello World'.
>>
>> 'The City of London, London' also has no occurrences of any
repeated
>> phrase.
>>
>> Anyone has got such a function to check out the number of
occurrence
>> of any repeated phrases?
>>
>> Regards,
>>
>> David
>
> Don't know if it's exactly what you want, but you can replace all
> occurence of the phrase in the text by empty string and compute the
> diff between the initial and the result and next divide by the
length
> of your phrase.
>
> Example :
> WITH x AS (SELECT 'toto like tata and toto like titi and toto like
> tutu' , 'toto like' phrase)
> SELECT (char_length(texte) - char_length(replace(texte, phrase,
'')))
> / char_length(phrase) AS nb_occurence
> FROM x
>
This works if the user knows the phrase. As far as I understood, the
phrase is not known, and user wants to count number of repeats of any
phrases.
Of course this can be done with recursive CTE. Split into words,
generate all phrases (AFAIK requires recursion), then group and count.

But probably in PL/Perl this could be done  more effectively.


Is there an example of using recursive CTE to split a text string into 
words?


Recursion is not needed for splitting into words. This can be done by 
regexp_split_to_table function.


But generation of all possible phrases from the given list of words 
probably requires recursion. On the first step the list of words becomes 
a list of a single-worded phrases. On each iteration then, you add the 
next word to each existing phrase, if it is possible (i.e. until the 
last word is reached).




Regards,

David


Regards,
Ivan



Re: Counting the number of repeated phrases in a column

2022-01-26 Thread benj . dev
>On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko 
>
>wrote:
>
>
>>
>> On 26.01.2022 00:21, benj(dot)dev(at)laposte(dot)net wrote:
>> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
>> >> There is a short of a function in the standard Postgres to do the
>> >> following:
>> >>
>> >> It is easy to count the number of occurrence of words, but it is
>> >> rather difficult to count the number of occurrence of phrases.
>> >>
>> >> For instance:
>> >>
>> >> A cell of value:  'Hello World' means 1 occurrence a phrase.
>> >>
>> >> A cell of value: 'Hello World World Hello' means no occurrence of any
>> >> repeated phrase.
>> >>
>> >> But, A cell of value: 'Hello World World Hello Hello World' means 2
>> >> occurrences of 'Hello World'.
>> >>
>> >> 'The City of London, London' also has no occurrences of any repeated
>> >> phrase.
>> >>
>> >> Anyone has got such a function to check out the number of occurrence
>> >> of any repeated phrases?
>> >>
>> >> Regards,
>> >>
>> >> David
>> >
>> > Don't know if it's exactly what you want, but you can replace all
>> > occurence of the phrase in the text by empty string and compute the
>> > diff between the initial and the result and next divide by the length
>> > of your phrase.
>> >
>> > Example :
>> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like
>> > tutu' , 'toto like' phrase)
>> > SELECT (char_length(texte) - char_length(replace(texte, phrase, '')))
>> > / char_length(phrase) AS nb_occurence
>> > FROM x
>> >
>> This works if the user knows the phrase. As far as I understood, the
>> phrase is not known, and user wants to count number of repeats of any
>> phrases.
>> Of course this can be done with recursive CTE. Split into words,
>> generate all phrases (AFAIK requires recursion), then group and count.
>>
>> But probably in PL/Perl this could be done  more effectively.
>>
>
>
>Is there an example of using recursive CTE to split a text string into
>words?
>
>
>Regards,
>
>
>David

Without recursive, a "brutal" solution may be something like

WITH original_text AS (SELECT 'Hello World World Hello my friend Hello World' 
sentence)
, range_to_search AS (SELECT *, generate_series(1,5) gs FROM original_text) -- 
1 is the minimal group of word searched, 5 is the maximal grouped word searched
, x AS (
SELECT r.sentence, gs
, array_to_string((array_agg(rstt.word) OVER (PARTITION BY gs ORDER BY rstt.pos 
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING))[1:gs],' ') AS search_words
FROM range_to_search r
LEFT JOIN LATERAL regexp_split_to_table(r.sentence, ' ') WITH ORDINALITY 
rstt(word,pos) ON true
)
SELECT DISTINCT search_words, (char_length(sentence) - 
char_length(replace(sentence, search_words, '')))
/ NULLIF(char_length(search_words),0) AS nb_occurence
FROM x

It's also possible to define a minimal number of word accepted

 

Re: Counting the number of repeated phrases in a column

2022-01-26 Thread Shaozhong SHI
On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko 
wrote:

>
> On 26.01.2022 00:21, benj@laposte.net wrote:
> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
> >> There is a short of a function in the standard Postgres to do the
> >> following:
> >>
> >> It is easy to count the number of occurrence of words, but it is
> >> rather difficult to count the number of occurrence of phrases.
> >>
> >> For instance:
> >>
> >> A cell of value:  'Hello World' means 1 occurrence a phrase.
> >>
> >> A cell of value: 'Hello World World Hello' means no occurrence of any
> >> repeated phrase.
> >>
> >> But, A cell of value: 'Hello World World Hello Hello World' means 2
> >> occurrences of 'Hello World'.
> >>
> >> 'The City of London, London' also has no occurrences of any repeated
> >> phrase.
> >>
> >> Anyone has got such a function to check out the number of occurrence
> >> of any repeated phrases?
> >>
> >> Regards,
> >>
> >> David
> >
> > Don't know if it's exactly what you want, but you can replace all
> > occurence of the phrase in the text by empty string and compute the
> > diff between the initial and the result and next divide by the length
> > of your phrase.
> >
> > Example :
> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like
> > tutu' , 'toto like' phrase)
> > SELECT (char_length(texte) - char_length(replace(texte, phrase, '')))
> > / char_length(phrase) AS nb_occurence
> > FROM x
> >
> This works if the user knows the phrase. As far as I understood, the
> phrase is not known, and user wants to count number of repeats of any
> phrases.
> Of course this can be done with recursive CTE. Split into words,
> generate all phrases (AFAIK requires recursion), then group and count.
>
> But probably in PL/Perl this could be done  more effectively.
>

Is there an example of using recursive CTE to split a text string into
words?

Regards,

David