Re: Detecting repeated phrase in a string

2021-12-09 Thread Peter J. Holzer
On 2021-12-09 16:11:31 +0100, Andreas Joseph Krogh wrote:
> For repeated words (including unicode-chars) you can do:
>  
> (\b\p{L}+\b)(?:\s+\1)+
>  
> I'm not quite sure how to translate this to PG, but in JAVA it works.

See 
https://www.postgresql.org/docs/11/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Detecting repeated phrase in a string

2021-12-09 Thread Andreas Joseph Krogh

På torsdag 09. desember 2021 kl. 15:46:05, skrev Shaozhong SHI <
shishaozh...@gmail.com >: 


Hi, Peter, 

How to define word boundary as either by using 
^ , space, or $ 

So that the following can be done 

fox fox is a repeat 

foxfox is not a repeat but just one word. 

Do you want repeated phrase (list of words) ore repeated words? 
For repeated words (including unicode-chars) you can do: 

(\b\p{L}+\b)(?:\s+\1)+ 

I'm not quite sure how to translate this to PG, but in JAVA it works. 



-- 
Andreas Joseph Krogh 
CTO / Partner - Visena AS 
Mobile: +47 909 56 963 
andr...@visena.com  
www.visena.com  
  


Re: Detecting repeated phrase in a string

2021-12-09 Thread Shaozhong SHI
Hi, Peter,

How to define word boundary as either by using
^  , space, or $

So that the following can be done

fox fox is a repeat

foxfox is not a repeat but just one word.

Regards,

David

On Thu, 9 Dec 2021 at 13:35, Peter J. Holzer  wrote:

> On 2021-12-09 12:38:15 +, Shaozhong SHI wrote:
> > Does anyone know how to detect repeated phrase in a string?
>
> Use regular expressions with backreferences:
>
> bayes=> select regexp_match('foo wikiwiki bar', '(.+)\1');
> ╔══╗
> ║ regexp_match ║
> ╟──╢
> ║ {o}  ║
> ╚══╝
> (1 row)
>
> "o" is repeated in "foo".
>
> bayes=> select regexp_match('fo wikiwiki bar', '(.+)\1');
> ╔══╗
> ║ regexp_match ║
> ╟──╢
> ║ {wiki}   ║
> ╚══╝
> (1 row)
>
> "wiki" is repeated in "wikiwiki".
>
> bayes=> select regexp_match('fo wikiwi bar', '(.+)\1');
> ╔══╗
> ║ regexp_match ║
> ╟──╢
> ║ (∅)  ║
> ╚══╝
> (1 row)
>
> nothing is repeated.
>
> Adjust the expression within parentheses if you want to match somethig
> more specific than any sequence of one or more characters.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Detecting repeated phrase in a string

2021-12-09 Thread Peter J. Holzer
On 2021-12-09 12:38:15 +, Shaozhong SHI wrote:
> Does anyone know how to detect repeated phrase in a string?

Use regular expressions with backreferences:

bayes=> select regexp_match('foo wikiwiki bar', '(.+)\1');
╔══╗
║ regexp_match ║
╟──╢
║ {o}  ║
╚══╝
(1 row)

"o" is repeated in "foo".

bayes=> select regexp_match('fo wikiwiki bar', '(.+)\1');
╔══╗
║ regexp_match ║
╟──╢
║ {wiki}   ║
╚══╝
(1 row)

"wiki" is repeated in "wikiwiki".

bayes=> select regexp_match('fo wikiwi bar', '(.+)\1');
╔══╗
║ regexp_match ║
╟──╢
║ (∅)  ║
╚══╝
(1 row)

nothing is repeated.

Adjust the expression within parentheses if you want to match somethig
more specific than any sequence of one or more characters.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Detecting repeated phrase in a string

2021-12-09 Thread Shaozhong SHI
Does anyone know how to detect repeated phrase in a string?

Is there any such function?

Regards,

David