Re: Can Postgres beat Oracle for regexp_count?
On Wed, Feb 2, 2022 at 4:26 PM Tom Lane wrote: > > "David G. Johnston" writes: > > Given we don't have a regexp_count function this isn't surprising... > > FYI, it's there in HEAD. > > In the meantime, you could possibly do something like > > =# select count(*) from regexp_matches('My High Street', '([A-Z][a-z]+[\s])', > 'g'); > count > --- > 2 > (1 row) alternate version: postgres=# select array_upper(regexp_split_to_array('My High Street My High Street', 'My High Street'), 1) - 1; ?column? ── 2 can help to slide this into complex queries a little bit easier by avoiding the aggregation :-). merlin
Re: Can Postgres beat Oracle for regexp_count?
Hi, David, Many thanks. I am investigating into transformation of data quality validation through automation with application of Postgres/PostGIS. Regards, David On Thu, 3 Feb 2022 at 13:00, David G. Johnston wrote: > > > On Thursday, February 3, 2022, Shaozhong SHI > wrote: > >> >> Is it correct to say that this ?: construction of a regex can be applied >> for checking whether cell values meet specifications? >> >>> >>> > It does exactly what our examples shows it does. I don’t understand what > you mean above but if that helps you remember its purpose, great. > > David J. > >
Re: Can Postgres beat Oracle for regexp_count?
On Thursday, February 3, 2022, Shaozhong SHI wrote: > > Is it correct to say that this ?: construction of a regex can be applied > for checking whether cell values meet specifications? > >> >> It does exactly what our examples shows it does. I don’t understand what you mean above but if that helps you remember its purpose, great. David J.
Re: Can Postgres beat Oracle for regexp_count?
Many thanks, Tom, select regexp_matches('My High Street', '(?:[A-Z][a-z]+[\s]*)+', 'g'); looks very interesting. I did read the documentation, but found it is difficult to read. Particularly, the documentation on the use ?: does not state clear sense. There is only limited explanation on ?:. Is it correct to say that this ?: construction of a regex can be applied for checking whether cell values meet specifications? Regards, David On Thu, 3 Feb 2022 at 05:59, Tom Lane wrote: > Shaozhong SHI writes: > > The following has been attempted but no luck. > > > select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g') > > It is intended to match 'My High Street, but it turned out only 'Street' > > was matched. > > You've got the parentheses in the wrong place, ie inside not outside the > "+" quantifier. Per the fine manual [1], the result is determined by the > last match of quantified capturing parens. > > You could avoid using any capturing parens, so that the result is > the whole match: > > regression=# select regexp_matches('My High Street', > '(?:[A-Z][a-z]+[\s]*)+', 'g'); >regexp_matches > > {"My High Street"} > (1 row) > > or you could do > > regression=# select regexp_matches('My High Street', > '(([A-Z][a-z]+[\s]*)+)', 'g'); > regexp_matches > --- > {"My High Street",Street} > (1 row) > > but then you have two sets of capturing parens and you get results for > both, so you might prefer > > regression=# select regexp_matches('My High Street', > '((?:[A-Z][a-z]+[\s]*)+)', 'g'); >regexp_matches > > {"My High Street"} > (1 row) > > In any case, there's no substitute for reading the manual. > > regards, tom lane > > [1] > https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP >
Re: Can Postgres beat Oracle for regexp_count?
Shaozhong SHI writes: > The following has been attempted but no luck. > select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g') > It is intended to match 'My High Street, but it turned out only 'Street' > was matched. You've got the parentheses in the wrong place, ie inside not outside the "+" quantifier. Per the fine manual [1], the result is determined by the last match of quantified capturing parens. You could avoid using any capturing parens, so that the result is the whole match: regression=# select regexp_matches('My High Street', '(?:[A-Z][a-z]+[\s]*)+', 'g'); regexp_matches {"My High Street"} (1 row) or you could do regression=# select regexp_matches('My High Street', '(([A-Z][a-z]+[\s]*)+)', 'g'); regexp_matches --- {"My High Street",Street} (1 row) but then you have two sets of capturing parens and you get results for both, so you might prefer regression=# select regexp_matches('My High Street', '((?:[A-Z][a-z]+[\s]*)+)', 'g'); regexp_matches {"My High Street"} (1 row) In any case, there's no substitute for reading the manual. regards, tom lane [1] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
Re: Can Postgres beat Oracle for regexp_count?
On Wed, Feb 2, 2022 at 10:26 PM Shaozhong SHI wrote: > > select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g') > It is intended to match 'My High Street, but it turned out only 'Street' > was matched. > > I'm too tired to find the documentation for why you saw your result but basically you only have a single capturing parentheses pair and since you've quantified that you end up with just the last capture that was found - Street. If you want to capture the entire found expression you need to capture the quantifier. So put parentheses around the entire regexp. select regexp_matches('My High Street', '(([A-Z][a-z]+[\s]*)+)', 'g') You now have a two element array, slots filled left-to-right based upon the opening parenthesis. So {"My High Street",Street} To get rid of the undesired Street and only return a single element array you need to make the inner parentheses non-capturing. select regexp_matches('My High Street', '((?:[A-Z][a-z]+[\s]*)+)', 'g') David J.
Re: Can Postgres beat Oracle for regexp_count?
Hi, Tom, Lane, On Wed, 2 Feb 2022 at 22:26, Tom Lane wrote: > "David G. Johnston" writes: > > Given we don't have a regexp_count function this isn't surprising... > > FYI, it's there in HEAD. > > In the meantime, you could possibly do something like > > =# select count(*) from regexp_matches('My High Street', > '([A-Z][a-z]+[\s])', 'g'); > count > --- > 2 > (1 row) > > (Note that 2 is the correct answer given that there's no space > after the third word; I trust Oracle agrees.) > > Can the whole 3 or 4 or 5 to be matched as 1? > The following has been attempted but no luck. select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g') It is intended to match 'My High Street, but it turned out only 'Street' was matched. Regards, David
Re: Can Postgres beat Oracle for regexp_count?
"David G. Johnston" writes: > Given we don't have a regexp_count function this isn't surprising... FYI, it's there in HEAD. In the meantime, you could possibly do something like =# select count(*) from regexp_matches('My High Street', '([A-Z][a-z]+[\s])', 'g'); count --- 2 (1 row) (Note that 2 is the correct answer given that there's no space after the third word; I trust Oracle agrees.) regards, tom lane
Re: Can Postgres beat Oracle for regexp_count?
On Wed, Feb 2, 2022 at 1:20 PM Shaozhong SHI wrote: > It has been found that regexp_count works brilliantly in Oracle. > What query exactly did you execute in Oracle that you wish to see if an equivalent can be formulated in PostgreSQL? > > However, it is not easy to replicate that in Postgres. > Given we don't have a regexp_count function this isn't surprising... > The following codes have been experimented but without any luck. > > select regexp_matches('My High Street', '([A-Z][a-z]+[\s])', 'g') > > select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)', 'g') > See my first point. David J.
Can Postgres beat Oracle for regexp_count?
It has been found that regexp_count works brilliantly in Oracle. However, it is not easy to replicate that in Postgres. The following codes have been experimented but without any luck. select regexp_matches('My High Street', '([A-Z][a-z]+[\s])', 'g') select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)', 'g') County occurrences of 'My High Street' in one of the following strings: 'My High Street'1 '' 0 'My High Street My High Street' 2 Can anyone enlighten all of us? Regards, David