Re: Counting the number of repeated phrases in a column

Shaozhong SHI Wed, 26 Jan 2022 00:12:25 -0800

On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko <[email protected]>
wrote:


>
> On 26.01.2022 00:21, [email protected] wrote:
> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
> >> There is a short of a function in the standard Postgres to do the
> >> following:
> >>
> >> It is easy to count the number of occurrence of words, but it is
> >> rather difficult to count the number of occurrence of phrases.
> >>
> >> For instance:
> >>
> >> A cell of value:  'Hello World' means 1 occurrence a phrase.
> >>
> >> A cell of value: 'Hello World World Hello' means no occurrence of any
> >> repeated phrase.
> >>
> >> But, A cell of value: 'Hello World World Hello Hello World' means 2
> >> occurrences of 'Hello World'.
> >>
> >> 'The City of London, London' also has no occurrences of any repeated
> >> phrase.
> >>
> >> Anyone has got such a function to check out the number of occurrence
> >> of any repeated phrases?
> >>
> >> Regards,
> >>
> >> David
> >
> > Don't know if it's exactly what you want, but you can replace all
> > occurence of the phrase in the text by empty string and compute the
> > diff between the initial and the result and next divide by the length
> > of your phrase.
> >
> > Example :
> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like
> > tutu' , 'toto like' phrase)
> > SELECT (char_length(texte) - char_length(replace(texte, phrase, '')))
> > / char_length(phrase) AS nb_occurence
> > FROM x
> >
> This works if the user knows the phrase. As far as I understood, the
> phrase is not known, and user wants to count number of repeats of any
> phrases.
> Of course this can be done with recursive CTE. Split into words,
> generate all phrases (AFAIK requires recursion), then group and count.
>
> But probably in PL/Perl this could be done  more effectively.
>

Is there an example of using recursive CTE to split a text string into
words?

Regards,

David

Re: Counting the number of repeated phrases in a column

Reply via email to