Regular Expression For Duplicate Words
This link is interesting. regex - Regular Expression For Duplicate Words - Stack Overflow <https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words> Is there any example in Postgres? Regards, David
Re: Regular Expression For Duplicate Words
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI wrote: > This link is interesting. > > regex - Regular Expression For Duplicate Words - Stack Overflow > <https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words> > > Is there any example in Postgres? > > Not that I'm immediately aware of, and I'm not going to search the internet for you. The regex capabilities in PostgreSQL are pretty full-featured so a solution should be possible. You should try translating the SO post concepts into PostgreSQL yourself and ask specific questions if you get stuck. David J.
Re: Regular Expression For Duplicate Words
It's an interesting question. But I also don't know how to do it in PostgreSQL. But I figured out alternative solutions. GNU Grep:grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world' ripgrep: rg '(hello)[[:blank:]]+\1' --pcre2 <<<'one hello hello world' On Wed, Feb 2, 2022 at 8:53 PM David G. Johnston wrote: > On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI > wrote: > >> This link is interesting. >> >> regex - Regular Expression For Duplicate Words - Stack Overflow >> <https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words> >> >> Is there any example in Postgres? >> >> > Not that I'm immediately aware of, and I'm not going to search the > internet for you. > > The regex capabilities in PostgreSQL are pretty full-featured so a > solution should be possible. You should try translating the SO post > concepts into PostgreSQL yourself and ask specific questions if you get > stuck. > > David J. > >
Re: Regular Expression For Duplicate Words
On 2022-02-02 08:00:00 +, Shaozhong SHI wrote: > regex - Regular Expression For Duplicate Words - Stack Overflow > > Is there any example in Postgres? It's pretty much the same as with other regexp dialects: User word boundaries and a word character class to match any word and then use a backreference to match a duplicate word. All the building blocks are described on https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP and except for [[:<:]] and [[:>:]] for the word boundaries, they are also pretty standard. So [[:<:]]start of word ([[:alpha:]]+) one or more alphabetic characters in a capturing group [[:>:]]end of word \W+one or more non-word characters [[:<:]]start of word \1 the content of the first (and only) capturing group [[:>:]]end of word All together: select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]'; hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Regular Expression For Duplicate Words
Hi, Peter, Interesting. On Thu, 3 Feb 2022 at 19:48, Peter J. Holzer wrote: > On 2022-02-02 08:00:00 +, Shaozhong SHI wrote: > > regex - Regular Expression For Duplicate Words - Stack Overflow > > > > Is there any example in Postgres? > > It's pretty much the same as with other regexp dialects: User word > boundaries and a word character class to match any word and then use a > backreference to match a duplicate word. All the building blocks are > described on > > https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP > and except for [[:<:]] and [[:>:]] for the word boundaries, they are > also pretty standard. > > So > > [[:<:]]start of word > ([[:alpha:]]+) one or more alphabetic characters in a capturing group > [[:>:]]end of word > \W+one or more non-word characters > [[:<:]]start of word > \1 the content of the first (and only) capturing group > [[:>:]]end of word > > All together: > > select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]'; > > Give a good example if you can. > Regards, David