Re: [HACKERS] regexp_match() returning text

2016-08-18 Thread Tom Lane
Emre Hasegeli  writes:
>> I did *not* push the hunk in citext.sgml, since that was alleging support
>> that doesn't actually exist in this patch.  To make this work for citext,
>> we need to add wrapper functions similar to citext's wrappers for
>> regexp_matches.  And that in turn means a citext extension version bump,
>> which makes it more notationally tedious than I felt like dealing with
>> today.  I'm bouncing that requirement back to you to produce a separate
>> patch for.

> It is attached.

You missed updating citext_1.out, but otherwise looks good.  Pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-08-18 Thread Emre Hasegeli
> I did *not* push the hunk in citext.sgml, since that was alleging support
> that doesn't actually exist in this patch.  To make this work for citext,
> we need to add wrapper functions similar to citext's wrappers for
> regexp_matches.  And that in turn means a citext extension version bump,
> which makes it more notationally tedious than I felt like dealing with
> today.  I'm bouncing that requirement back to you to produce a separate
> patch for.

It is attached.


citext-regexp-match-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-08-17 Thread Tom Lane
"David G. Johnston"  writes:
> I didn't compile either patch but given the scope and complexity I'd say it
> is ready for committer without that confirmed.  Tom usually touches the
> regexp code and I'm pretty sure he'll look at this with an eye no one else
> has.  Though I wouldn't expect anything until work on 10 begins in earnest.

Pushed with some cosmetic adjustments to the code and rather more
extensive work on the documentation.

I did *not* push the hunk in citext.sgml, since that was alleging support
that doesn't actually exist in this patch.  To make this work for citext,
we need to add wrapper functions similar to citext's wrappers for
regexp_matches.  And that in turn means a citext extension version bump,
which makes it more notationally tedious than I felt like dealing with
today.  I'm bouncing that requirement back to you to produce a separate
patch for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-06-04 Thread David G. Johnston
On Saturday, June 4, 2016, Emre Hasegeli  wrote:

> > The main problem being solved is the use of a SETOF result.  I'm
> inclined to
> > prefer that the final, single, result is still an array.
>
> I have changed it like that.  New patch attached.


Good


>
> > I've got a style issue with the information_schema - I like to call it
> > useless-use-of-E'' -  but that was there long before this patch...
>
> We better not touch it.


Agreed


>
> > /* user mustn't specify 'g' for regexp_split */ - do we add "or
> > regexp_match" or just removed the extraneous detail?
>
> I don't think it would be a nice error message.


Just meant the comment.  I think they look good now though the part about
force global in the split area just says what the code does and not why.
Namely that when splitting we find all matches so the input is completely
split.  We disallow a user specification of global for some arbitrary
reason; though I don't have any reason to change that and the present
behavior doesn't cause people to complain.


>
> > There seems to be scope creep regarding "regexp_split_to_table" that I'm
> > surprised to find.  Related to that is the unexpected removal of the
> > "force_glob" parameter to setup_regexp_matches.  You took what was a
> single
> > block of code and now duplicated it without any explanation in the commit
> > message (a code comment wouldn't work for this kind of change).  The
> change
> > to flags from passing a pointer to text to passing in a pointer to a
> > previously derived pg_re_flags makes more sense on its face, and it is
> > apparently a non-public API, but again constitutes a refactoring that at
> > least would ideally be a separate commit from the one the introduces the
> new
> > behavior.
>
> That check doesn't belong to setup_regexp_matches() in the first place.
> The arguments of the function are organised to be caller agnostic,
> and then it gives an error on behalf of regexp_split().  The check
> fits better to the regexp_split() functions even with duplication.
>
> I can split it to another patch, but I think these kind of changes most
> often go together.


After reading this again I understand better what is going on and agree
with the changes as written.


>
> Would you mind adding yourself to the reviewers on the Commitfest app?
> I think you have already read though it.
>
>
Done.

I didn't compile either patch but given the scope and complexity I'd say it
is ready for committer without that confirmed.  Tom usually touches the
regexp code and I'm pretty sure he'll look at this with an eye no one else
has.  Though I wouldn't expect anything until work on 10 begins in earnest.

David J.


Re: [HACKERS] regexp_match() returning text

2016-06-04 Thread Emre Hasegeli
> The main problem being solved is the use of a SETOF result.  I'm inclined to
> prefer that the final, single, result is still an array.

I have changed it like that.  New patch attached.

> I've got a style issue with the information_schema - I like to call it
> useless-use-of-E'' -  but that was there long before this patch...

We better not touch it.

> /* user mustn't specify 'g' for regexp_split */ - do we add "or
> regexp_match" or just removed the extraneous detail?

I don't think it would be a nice error message.

> There seems to be scope creep regarding "regexp_split_to_table" that I'm
> surprised to find.  Related to that is the unexpected removal of the
> "force_glob" parameter to setup_regexp_matches.  You took what was a single
> block of code and now duplicated it without any explanation in the commit
> message (a code comment wouldn't work for this kind of change).  The change
> to flags from passing a pointer to text to passing in a pointer to a
> previously derived pg_re_flags makes more sense on its face, and it is
> apparently a non-public API, but again constitutes a refactoring that at
> least would ideally be a separate commit from the one the introduces the new
> behavior.

That check doesn't belong to setup_regexp_matches() in the first place.
The arguments of the function are organised to be caller agnostic,
and then it gives an error on behalf of regexp_split().  The check
fits better to the regexp_split() functions even with duplication.

I can split it to another patch, but I think these kind of changes most
often go together.

Would you mind adding yourself to the reviewers on the Commitfest app?
I think you have already read though it.

Thanks for all the feedback.
From a6f24b34fdb39eaaca1d3819f7f528b1689725a4 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli 
Date: Sun, 29 May 2016 18:53:37 +0200
Subject: [PATCH] Add regexp_match()

---
 doc/src/sgml/citext.sgml  |   5 ++
 doc/src/sgml/func.sgml|  62 +-
 src/backend/utils/adt/regexp.c| 119 +-
 src/include/catalog/pg_proc.h |   4 ++
 src/include/utils/builtins.h  |   2 +
 src/test/regress/expected/regex.out   |  28 
 src/test/regress/expected/strings.out |   4 +-
 src/test/regress/sql/regex.sql|   7 ++
 8 files changed, 183 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 7fdf302..9b4c68f 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry';
   
 
   
Similarly, all of the following functions perform matching
case-insensitively if their arguments are citext:
   
 
   

 
+  regexp_match()
+
+   
+   
+
   regexp_matches()
 


 
   regexp_replace()
 


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff7545d..64e975e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1935,32 +1935,49 @@
 or, if the argument is null, return NULL.
 Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)
'42.5'
   
 
   

 
+ regexp_match
+
+regexp_match(string text, pattern text [, flags text])
+   
+   text[]
+   
+Return a single captured substring resulting from matching a POSIX
+regular expression against the string. See
+ for more information.
+   
+   regexp_match('foobarbequebaz', '(bar)(beque)')
+   {bar,beque}
+  
+
+  
+   
+
  regexp_matches
 
 regexp_matches(string text, pattern text [, flags text])

setof text[]

 Return all captured substrings resulting from matching a POSIX regular
 expression against the string. See
  for more information.

-   regexp_matches('foobarbequebaz', '(bar)(beque)')
-   {bar,beque}
+   regexp_matches('foobarbequebaz', 'ba.', 'g')
+   {bar}{baz} (2 rows)
   
 
   

 
  regexp_replace
 
 regexp_replace(string text, pattern text, replacement text [, flags text])

text
@@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression
 pattern matching


 substring


 regexp_replace


+regexp_match
+   
+   
 regexp_matches


 regexp_split_to_table


 regexp_split_to_array

 

@@ -4168,66 +4188,78 @@ substring('foobar' from 'o(.)b')   o
 regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
 regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
fooXarYXazY
 

Re: [HACKERS] regexp_match() returning text

2016-06-03 Thread Jim Nasby

On 5/30/16 1:01 PM, Andrew Gierth wrote:

Returning an array containing the values of all capture groups might be
more useful (substring returns the value of the first capture group if
any, otherwise the matched string).


+1.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-05-30 Thread Andrew Gierth
> "Emre" == Emre Hasegeli  writes:

 Emre> Attached patch adds regexp_match() function which is a simple
 Emre> variant of regexp_matches() that doesn't return a set.

We already have a function that takes a string and a regexp and returns
a single text result: substring().

Regexp flags other than 'g' can also be embedded in the regexp:

postgres=# select substring('foo bar' from '(?i)BA+');
 substring 
---
 ba

Returning an array containing the values of all capture groups might be
more useful (substring returns the value of the first capture group if
any, otherwise the matched string).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-05-30 Thread David G. Johnston
On Sun, May 29, 2016 at 1:28 PM, Emre Hasegeli  wrote:

> Attached patch adds regexp_match() function which is a simple variant of
> regexp_matches() that doesn't return a set.  It is based on Tom Lane's
> comment to bug #10889 [1].
>
> [1] https://www.postgresql.org/message-id/23769.1404747...@sss.pgh.pa.us


Not sure if we need two functions or what but I dislike that:

"It ignores the parenthesized subexpressions in the pattern."

​The main problem being solved is the use of a SETOF result.  I'm inclined
to prefer that the final, single, result is still an array.

Even if we return a single text value instead of an array it would be nice
to return the first (only...) parenthesized subexpression so that the input
pattern isn't constrained.

​I've got a style issue with the information_schema - I like to call it
useless-use-of-E'' -  but that was there long before this patch...

/* user mustn't specify 'g' for regexp_split */ - do we add "or
regexp_match" or just removed the extraneous detail?

There seems to be scope creep regarding "regexp_split_to_table" that I'm
surprised to find.  Related to that is the unexpected removal of the
"force_glob" parameter to setup_regexp_matches.  You took what was a single
block of code and now duplicated it without any explanation in the commit
message (a code comment wouldn't work for this kind of change).  The change
to flags from passing a pointer to text to passing in a pointer to a
previously derived pg_re_flags makes more sense on its face, and it is
apparently a non-public API, but again constitutes a refactoring that at
least would ideally be a separate commit from the one the introduces the
new behavior.

Also, I see an assert for the "no subexpressions" policy but I have seemed
to have overlooked where the non-assert-enabled user is informed of the
prohibition.

Tom's opinion on all this will be needed - I am not a C code writer nor do
I follow the patch writing process that closely generally, but I have a
keen interest in this topic - but there seems to be a bit more here than
simply having a function identical to the existing regexp_matches function
that prohibits the global flag and only ever returns a single array per the
existing API.

I'm good with the name, regexp_match - even with the behavior being changed
to returning an array instead of text.  If there was any active plans to
redo the API so that we'd return a composite type instead of an array I'd
be more inclined to reserve "match" for that change and make this one a bit
more verbose.

David J.


[HACKERS] regexp_match() returning text

2016-05-29 Thread Emre Hasegeli
Attached patch adds regexp_match() function which is a simple variant of
regexp_matches() that doesn't return a set.  It is based on Tom Lane's
comment to bug #10889 [1].

[1] https://www.postgresql.org/message-id/23769.1404747...@sss.pgh.pa.us
From f8c113c77864ef1ca6386195aea02b2090ff17b6 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli 
Date: Sun, 29 May 2016 18:53:37 +0200
Subject: [PATCH] Add regexp_match()

---
 doc/src/sgml/citext.sgml   |   5 ++
 doc/src/sgml/func.sgml |  59 +++---
 src/backend/catalog/information_schema.sql |   2 +-
 src/backend/utils/adt/regexp.c | 125 +
 src/include/catalog/pg_proc.h  |   4 +
 src/include/utils/builtins.h   |   2 +
 src/test/regress/expected/regex.out|  28 +++
 src/test/regress/expected/strings.out  |   4 +-
 src/test/regress/sql/regex.sql |   7 ++
 9 files changed, 188 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 7fdf302..9b4c68f 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry';
   
 
   
Similarly, all of the following functions perform matching
case-insensitively if their arguments are citext:
   
 
   

 
+  regexp_match()
+
+   
+   
+
   regexp_matches()
 


 
   regexp_replace()
 


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff7545d..503dfe5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1935,20 +1935,37 @@
 or, if the argument is null, return NULL.
 Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)
'42.5'
   
 
   

 
+ regexp_match
+
+regexp_match(string text, pattern text [, flags text])
+   
+   text
+   
+Return a single captured substring resulting from matching a POSIX regular
+expression against the string. See
+ for more information.
+   
+   regexp_match('foobarbequebaz', 'barbeque')
+   barbeque
+  
+
+  
+   
+
  regexp_matches
 
 regexp_matches(string text, pattern text [, flags text])

setof text[]

 Return all captured substrings resulting from matching a POSIX regular
 expression against the string. See
  for more information.

@@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression
 pattern matching


 substring


 regexp_replace


+regexp_match
+   
+   
 regexp_matches


 regexp_split_to_table


 regexp_split_to_array

 

@@ -4168,66 +4188,81 @@ substring('foobar' from 'o(.)b')   o
 regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
 regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
fooXarYXazY
 

 
 
+ The regexp_match function returns the first captured
+ substring resulting from matching a POSIX regular expression
+ pattern.  It has the syntax
+ regexp_match(string, pattern
+ , flags ).
+ If the pattern does not match, the function returns
+ NULL.  It ignores the parenthesized subexpressions in
+ the pattern.  regexp_matches can be used in
+ this case (see below for details).
+ The flags parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior.  Supported flags
+ are described in .
+
+
+
  The regexp_matches function returns a text array of
  all of the captured substrings resulting from matching a POSIX
- regular expression pattern.  It has the syntax
- regexp_matches(string, pattern
- , flags ).
+ regular expression pattern.  It has the same syntax as
+ regexp_match.
  The function can return no rows, one row, or multiple rows (see
  the g flag below).  If the pattern
  does not match, the function returns no rows.  If the pattern
  contains no parenthesized subexpressions, then each row
  returned is a single-element text array containing the substring
  matching the whole pattern.  If the pattern contains parenthesized
  subexpressions, the function returns a text array whose
  n'th element is the substring matching the
  n'th parenthesized subexpression of the pattern
  (not counting non-capturing parentheses; see below for
  details).
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
- function's behavior.  Flag g causes the