Regular Expression For Duplicate Words

2022-02-02 Thread Shaozhong SHI
This link is interesting.

regex - Regular Expression For Duplicate Words - Stack Overflow
<https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words>

Is there any example in Postgres?

Regards,

David


Re: Regular Expression For Duplicate Words

2022-02-02 Thread David G. Johnston
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI  wrote:

> This link is interesting.
>
> regex - Regular Expression For Duplicate Words - Stack Overflow
> <https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words>
>
> Is there any example in Postgres?
>
>
Not that I'm immediately aware of, and I'm not going to search the internet
for you.

The regex capabilities in PostgreSQL are pretty full-featured so a solution
should be possible.  You should try translating the SO post concepts into
PostgreSQL yourself and ask specific questions if you get stuck.

David J.


Re: Regular Expression For Duplicate Words

2022-02-02 Thread Jian He
It's an interesting question. But I also don't know how to do it in
PostgreSQL.
But I figured out alternative solutions.

GNU Grep:grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'
ripgrep: rg  '(hello)[[:blank:]]+\1' --pcre2  <<<'one hello hello world'

On Wed, Feb 2, 2022 at 8:53 PM David G. Johnston 
wrote:

> On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI 
> wrote:
>
>> This link is interesting.
>>
>> regex - Regular Expression For Duplicate Words - Stack Overflow
>> <https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words>
>>
>> Is there any example in Postgres?
>>
>>
> Not that I'm immediately aware of, and I'm not going to search the
> internet for you.
>
> The regex capabilities in PostgreSQL are pretty full-featured so a
> solution should be possible.  You should try translating the SO post
> concepts into PostgreSQL yourself and ask specific questions if you get
> stuck.
>
> David J.
>
>


Re: Regular Expression For Duplicate Words

2022-02-03 Thread Peter J. Holzer
On 2022-02-02 08:00:00 +, Shaozhong SHI wrote:
> regex - Regular Expression For Duplicate Words - Stack Overflow
> 
> Is there any example in Postgres?

It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described on
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.

So

[[:<:]]start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]]end of word
\W+one or more non-word characters
[[:<:]]start of word
\1 the content of the first (and only) capturing group
[[:>:]]end of word

All together:

select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Regular Expression For Duplicate Words

2022-02-03 Thread Shaozhong SHI
Hi, Peter,  Interesting.

On Thu, 3 Feb 2022 at 19:48, Peter J. Holzer  wrote:

> On 2022-02-02 08:00:00 +, Shaozhong SHI wrote:
> > regex - Regular Expression For Duplicate Words - Stack Overflow
> >
> > Is there any example in Postgres?
>
> It's pretty much the same as with other regexp dialects: User word
> boundaries and a word character class to match any word and then use a
> backreference to match a duplicate word. All the building blocks are
> described on
>
> https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
> and except for [[:<:]] and [[:>:]] for the word boundaries, they are
> also pretty standard.
>
> So
>
> [[:<:]]start of word
> ([[:alpha:]]+) one or more alphabetic characters in a capturing group
> [[:>:]]end of word
> \W+one or more non-word characters
> [[:<:]]start of word
> \1 the content of the first (and only) capturing group
> [[:>:]]end of word
>
> All together:
>
> select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';
>
> Give a good example if you can.
>

Regards,

David