Re: [HACKERS] regexp_match() returning text
Emre Hasegeliwrites: >> I did *not* push the hunk in citext.sgml, since that was alleging support >> that doesn't actually exist in this patch. To make this work for citext, >> we need to add wrapper functions similar to citext's wrappers for >> regexp_matches. And that in turn means a citext extension version bump, >> which makes it more notationally tedious than I felt like dealing with >> today. I'm bouncing that requirement back to you to produce a separate >> patch for. > It is attached. You missed updating citext_1.out, but otherwise looks good. Pushed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regexp_match() returning text
> I did *not* push the hunk in citext.sgml, since that was alleging support > that doesn't actually exist in this patch. To make this work for citext, > we need to add wrapper functions similar to citext's wrappers for > regexp_matches. And that in turn means a citext extension version bump, > which makes it more notationally tedious than I felt like dealing with > today. I'm bouncing that requirement back to you to produce a separate > patch for. It is attached. citext-regexp-match-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regexp_match() returning text
"David G. Johnston"writes: > I didn't compile either patch but given the scope and complexity I'd say it > is ready for committer without that confirmed. Tom usually touches the > regexp code and I'm pretty sure he'll look at this with an eye no one else > has. Though I wouldn't expect anything until work on 10 begins in earnest. Pushed with some cosmetic adjustments to the code and rather more extensive work on the documentation. I did *not* push the hunk in citext.sgml, since that was alleging support that doesn't actually exist in this patch. To make this work for citext, we need to add wrapper functions similar to citext's wrappers for regexp_matches. And that in turn means a citext extension version bump, which makes it more notationally tedious than I felt like dealing with today. I'm bouncing that requirement back to you to produce a separate patch for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regexp_match() returning text
On Saturday, June 4, 2016, Emre Hasegeliwrote: > > The main problem being solved is the use of a SETOF result. I'm > inclined to > > prefer that the final, single, result is still an array. > > I have changed it like that. New patch attached. Good > > > I've got a style issue with the information_schema - I like to call it > > useless-use-of-E'' - but that was there long before this patch... > > We better not touch it. Agreed > > > /* user mustn't specify 'g' for regexp_split */ - do we add "or > > regexp_match" or just removed the extraneous detail? > > I don't think it would be a nice error message. Just meant the comment. I think they look good now though the part about force global in the split area just says what the code does and not why. Namely that when splitting we find all matches so the input is completely split. We disallow a user specification of global for some arbitrary reason; though I don't have any reason to change that and the present behavior doesn't cause people to complain. > > > There seems to be scope creep regarding "regexp_split_to_table" that I'm > > surprised to find. Related to that is the unexpected removal of the > > "force_glob" parameter to setup_regexp_matches. You took what was a > single > > block of code and now duplicated it without any explanation in the commit > > message (a code comment wouldn't work for this kind of change). The > change > > to flags from passing a pointer to text to passing in a pointer to a > > previously derived pg_re_flags makes more sense on its face, and it is > > apparently a non-public API, but again constitutes a refactoring that at > > least would ideally be a separate commit from the one the introduces the > new > > behavior. > > That check doesn't belong to setup_regexp_matches() in the first place. > The arguments of the function are organised to be caller agnostic, > and then it gives an error on behalf of regexp_split(). The check > fits better to the regexp_split() functions even with duplication. > > I can split it to another patch, but I think these kind of changes most > often go together. After reading this again I understand better what is going on and agree with the changes as written. > > Would you mind adding yourself to the reviewers on the Commitfest app? > I think you have already read though it. > > Done. I didn't compile either patch but given the scope and complexity I'd say it is ready for committer without that confirmed. Tom usually touches the regexp code and I'm pretty sure he'll look at this with an eye no one else has. Though I wouldn't expect anything until work on 10 begins in earnest. David J.
Re: [HACKERS] regexp_match() returning text
> The main problem being solved is the use of a SETOF result. I'm inclined to > prefer that the final, single, result is still an array. I have changed it like that. New patch attached. > I've got a style issue with the information_schema - I like to call it > useless-use-of-E'' - but that was there long before this patch... We better not touch it. > /* user mustn't specify 'g' for regexp_split */ - do we add "or > regexp_match" or just removed the extraneous detail? I don't think it would be a nice error message. > There seems to be scope creep regarding "regexp_split_to_table" that I'm > surprised to find. Related to that is the unexpected removal of the > "force_glob" parameter to setup_regexp_matches. You took what was a single > block of code and now duplicated it without any explanation in the commit > message (a code comment wouldn't work for this kind of change). The change > to flags from passing a pointer to text to passing in a pointer to a > previously derived pg_re_flags makes more sense on its face, and it is > apparently a non-public API, but again constitutes a refactoring that at > least would ideally be a separate commit from the one the introduces the new > behavior. That check doesn't belong to setup_regexp_matches() in the first place. The arguments of the function are organised to be caller agnostic, and then it gives an error on behalf of regexp_split(). The check fits better to the regexp_split() functions even with duplication. I can split it to another patch, but I think these kind of changes most often go together. Would you mind adding yourself to the reviewers on the Commitfest app? I think you have already read though it. Thanks for all the feedback. From a6f24b34fdb39eaaca1d3819f7f528b1689725a4 Mon Sep 17 00:00:00 2001 From: Emre HasegeliDate: Sun, 29 May 2016 18:53:37 +0200 Subject: [PATCH] Add regexp_match() --- doc/src/sgml/citext.sgml | 5 ++ doc/src/sgml/func.sgml| 62 +- src/backend/utils/adt/regexp.c| 119 +- src/include/catalog/pg_proc.h | 4 ++ src/include/utils/builtins.h | 2 + src/test/regress/expected/regex.out | 28 src/test/regress/expected/strings.out | 4 +- src/test/regress/sql/regex.sql| 7 ++ 8 files changed, 183 insertions(+), 48 deletions(-) diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml index 7fdf302..9b4c68f 100644 --- a/doc/src/sgml/citext.sgml +++ b/doc/src/sgml/citext.sgml @@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry'; Similarly, all of the following functions perform matching case-insensitively if their arguments are citext: + regexp_match() + + + + regexp_matches() regexp_replace() diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index ff7545d..64e975e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1935,32 +1935,49 @@ or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled. quote_nullable(42.5) '42.5' + regexp_match + +regexp_match(string text, pattern text [, flags text]) + + text[] + +Return a single captured substring resulting from matching a POSIX +regular expression against the string. See + for more information. + + regexp_match('foobarbequebaz', '(bar)(beque)') + {bar,beque} + + + + + regexp_matches regexp_matches(string text, pattern text [, flags text]) setof text[] Return all captured substrings resulting from matching a POSIX regular expression against the string. See for more information. - regexp_matches('foobarbequebaz', '(bar)(beque)') - {bar,beque} + regexp_matches('foobarbequebaz', 'ba.', 'g') + {bar}{baz} (2 rows) regexp_replace regexp_replace(string text, pattern text, replacement text [, flags text]) text @@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression pattern matching substring regexp_replace +regexp_match + + regexp_matches regexp_split_to_table regexp_split_to_array @@ -4168,66 +4188,78 @@ substring('foobar' from 'o(.)b') o regexp_replace('foobarbaz', 'b..', 'X') fooXbaz regexp_replace('foobarbaz', 'b..', 'X', 'g') fooXX regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g') fooXarYXazY
Re: [HACKERS] regexp_match() returning text
On 5/30/16 1:01 PM, Andrew Gierth wrote: Returning an array containing the values of all capture groups might be more useful (substring returns the value of the first capture group if any, otherwise the matched string). +1. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regexp_match() returning text
> "Emre" == Emre Hasegeliwrites: Emre> Attached patch adds regexp_match() function which is a simple Emre> variant of regexp_matches() that doesn't return a set. We already have a function that takes a string and a regexp and returns a single text result: substring(). Regexp flags other than 'g' can also be embedded in the regexp: postgres=# select substring('foo bar' from '(?i)BA+'); substring --- ba Returning an array containing the values of all capture groups might be more useful (substring returns the value of the first capture group if any, otherwise the matched string). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regexp_match() returning text
On Sun, May 29, 2016 at 1:28 PM, Emre Hasegeliwrote: > Attached patch adds regexp_match() function which is a simple variant of > regexp_matches() that doesn't return a set. It is based on Tom Lane's > comment to bug #10889 [1]. > > [1] https://www.postgresql.org/message-id/23769.1404747...@sss.pgh.pa.us Not sure if we need two functions or what but I dislike that: "It ignores the parenthesized subexpressions in the pattern." The main problem being solved is the use of a SETOF result. I'm inclined to prefer that the final, single, result is still an array. Even if we return a single text value instead of an array it would be nice to return the first (only...) parenthesized subexpression so that the input pattern isn't constrained. I've got a style issue with the information_schema - I like to call it useless-use-of-E'' - but that was there long before this patch... /* user mustn't specify 'g' for regexp_split */ - do we add "or regexp_match" or just removed the extraneous detail? There seems to be scope creep regarding "regexp_split_to_table" that I'm surprised to find. Related to that is the unexpected removal of the "force_glob" parameter to setup_regexp_matches. You took what was a single block of code and now duplicated it without any explanation in the commit message (a code comment wouldn't work for this kind of change). The change to flags from passing a pointer to text to passing in a pointer to a previously derived pg_re_flags makes more sense on its face, and it is apparently a non-public API, but again constitutes a refactoring that at least would ideally be a separate commit from the one the introduces the new behavior. Also, I see an assert for the "no subexpressions" policy but I have seemed to have overlooked where the non-assert-enabled user is informed of the prohibition. Tom's opinion on all this will be needed - I am not a C code writer nor do I follow the patch writing process that closely generally, but I have a keen interest in this topic - but there seems to be a bit more here than simply having a function identical to the existing regexp_matches function that prohibits the global flag and only ever returns a single array per the existing API. I'm good with the name, regexp_match - even with the behavior being changed to returning an array instead of text. If there was any active plans to redo the API so that we'd return a composite type instead of an array I'd be more inclined to reserve "match" for that change and make this one a bit more verbose. David J.
[HACKERS] regexp_match() returning text
Attached patch adds regexp_match() function which is a simple variant of regexp_matches() that doesn't return a set. It is based on Tom Lane's comment to bug #10889 [1]. [1] https://www.postgresql.org/message-id/23769.1404747...@sss.pgh.pa.us From f8c113c77864ef1ca6386195aea02b2090ff17b6 Mon Sep 17 00:00:00 2001 From: Emre HasegeliDate: Sun, 29 May 2016 18:53:37 +0200 Subject: [PATCH] Add regexp_match() --- doc/src/sgml/citext.sgml | 5 ++ doc/src/sgml/func.sgml | 59 +++--- src/backend/catalog/information_schema.sql | 2 +- src/backend/utils/adt/regexp.c | 125 + src/include/catalog/pg_proc.h | 4 + src/include/utils/builtins.h | 2 + src/test/regress/expected/regex.out| 28 +++ src/test/regress/expected/strings.out | 4 +- src/test/regress/sql/regex.sql | 7 ++ 9 files changed, 188 insertions(+), 48 deletions(-) diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml index 7fdf302..9b4c68f 100644 --- a/doc/src/sgml/citext.sgml +++ b/doc/src/sgml/citext.sgml @@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry'; Similarly, all of the following functions perform matching case-insensitively if their arguments are citext: + regexp_match() + + + + regexp_matches() regexp_replace() diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index ff7545d..503dfe5 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1935,20 +1935,37 @@ or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled. quote_nullable(42.5) '42.5' + regexp_match + +regexp_match(string text, pattern text [, flags text]) + + text + +Return a single captured substring resulting from matching a POSIX regular +expression against the string. See + for more information. + + regexp_match('foobarbequebaz', 'barbeque') + barbeque + + + + + regexp_matches regexp_matches(string text, pattern text [, flags text]) setof text[] Return all captured substrings resulting from matching a POSIX regular expression against the string. See for more information. @@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression pattern matching substring regexp_replace +regexp_match + + regexp_matches regexp_split_to_table regexp_split_to_array @@ -4168,66 +4188,81 @@ substring('foobar' from 'o(.)b') o regexp_replace('foobarbaz', 'b..', 'X') fooXbaz regexp_replace('foobarbaz', 'b..', 'X', 'g') fooXX regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g') fooXarYXazY + The regexp_match function returns the first captured + substring resulting from matching a POSIX regular expression + pattern. It has the syntax + regexp_match(string, pattern + , flags ). + If the pattern does not match, the function returns + NULL. It ignores the parenthesized subexpressions in + the pattern. regexp_matches can be used in + this case (see below for details). + The flags parameter is an optional text + string containing zero or more single-letter flags that change the + function's behavior. Supported flags + are described in . + + + The regexp_matches function returns a text array of all of the captured substrings resulting from matching a POSIX - regular expression pattern. It has the syntax - regexp_matches(string, pattern - , flags ). + regular expression pattern. It has the same syntax as + regexp_match. The function can return no rows, one row, or multiple rows (see the g flag below). If the pattern does not match, the function returns no rows. If the pattern contains no parenthesized subexpressions, then each row returned is a single-element text array containing the substring matching the whole pattern. If the pattern contains parenthesized subexpressions, the function returns a text array whose n'th element is the substring matching the n'th parenthesized subexpression of the pattern (not counting non-capturing parentheses; see below for details). - The flags parameter is an optional text - string containing zero or more single-letter flags that change the - function's behavior. Flag g causes the