Re: [GENERAL] back references using regex

2005-09-11 Thread Matthew Peter
How about this then, I didn't retain that information
from the doc. ;) I sometimes glaze over important gems
every now and then. It happens. I'm not a robot, yet.
At least I know the answer to my question is now
retained. You were a big help too. Thank you very
much. I appreciate it. 

Speaking of data manipulation in a table... I was
thinking about storing and manipulating a list in a
column... 

Is it possible to append and delete (unknown location)
items in a list? Or is another way more efficient? I'm
relatively new so sorry if I'm asking too many
questions. If possible, I will read about it if
there's docs for it so I don't trouble anyone with my
questions. It would be sweet to do that at the
database level. 

Thanks again, 
MP
Posgresql convert

--- Michael Fuhr [EMAIL PROTECTED] wrote:

 On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew
 Peter wrote:
  ah I swear I never came across any of
 these
  gems of information in the docs. It was these
 subtle
  differences that were throwing me. 
 
 From Regular Expression Escapes in the Pattern
 Matching section
 of the manual:
 
   A back reference (\n) matches the same string
 matched by the
   previous parenthesized subexpression specified by
 the number n
   (see Table 9-18).  For example, ([bc])\1 matches
 bb or cc but not
   bc or cb.  The subexpression must entirely precede
 the back
   reference in the RE.  Subexpressions are numbered
 in the order
   of their leading parentheses.  Non-capturing
 parentheses do not
   define subexpressions.
 

http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES
 
 -- 
 Michael Fuhr
 
 ---(end of
 broadcast)---
 TIP 1: if posting/reading through Usenet, please
 send an appropriate
subscribe-nomail command to
 [EMAIL PROTECTED] so that your
message can get through to the mailing list
 cleanly
 





__ 
Yahoo! for Good 
Watch the Hurricane Katrina Shelter From The Storm concert 
http://advision.webevents.yahoo.com/shelter 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] back references using regex

2005-09-11 Thread Michael Fuhr
On Sun, Sep 11, 2005 at 12:30:59AM -0700, Matthew Peter wrote:
 Speaking of data manipulation in a table... I was
 thinking about storing and manipulating a list in a
 column... 

Please ask new questions in a new thread with a Subject header
related to the new topic.  People who might be interested in following
the discussion, either to learn from it or to provide answers, might
miss it if the new topic hijacks an old thread.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] back references using regex

2005-09-10 Thread Douglas McNaught
Matthew Peter [EMAIL PROTECTED] writes:

 One other thing, when I wrote back I actually used
 34.31.29.20 (random), not 12.00.00.34 like i showed in
 the example, which is why i said it didn't work on
 digits.

 SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
   substring 
  ---
   
  (1 row)

 little did i know writing it with 12.00.00.34 would
 return 00.00... so yes, that did suprise me.
 Apparently only using the identical values returns a
 value. so it's saying x+ one more of the same value
 separated by a period... where shouldn't it be any
 letter, number or underscore followed by any
 letter, number or underscore?

Backreferences match the exact string matched by the corresponding set
of parentheses.  It's not the equivalent of substituting in the
parenthesized part of the regex and testing that for a match.  The
behavior above is as expected.  If you want it as any followed by
any you shold write the regex as '((\w+)\.(\w+))' -- then the two
parts can differ.

-Doug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] back references using regex

2005-09-10 Thread Michael Fuhr
On Fri, Sep 09, 2005 at 10:11:46PM -0700, Matthew Peter wrote:
 One other thing, when I wrote back I actually used
 34.31.29.20 (random), not 12.00.00.34 like i showed in
 the example, which is why i said it didn't work on
 digits.

When posting examples, please post something you actually tried so
other people can see exactly what you're doing and attempt to
duplicate your results.  Little differences sometimes matter, as
this case illustrates.

 SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
   substring 
  ---
   
  (1 row)
 
 little did i know writing it with 12.00.00.34 would
 return 00.00... so yes, that did suprise me.
 Apparently only using the identical values returns a
 value.

That's what a back reference is: it means match the same value you
matched before, not just anything that matches the same regular
expression.

Usenet junkies might find it amusing to use back references to
search their new server's list of newsgroups:

egrep '([^.]+)\.\1\.\1' newsgroups

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] back references using regex

2005-09-10 Thread Matthew Peter
ah I swear I never came across any of these
gems of information in the docs. It was these subtle
differences that were throwing me. 

I didn't originally catch that regex's were based on
grep/sed/awk syntax which I haven't studied throughly
yet. I've only used some basic operations in bash
scripts. I'll read up more on those. Thanks.

MP


--- Douglas McNaught [EMAIL PROTECTED] wrote:

 Matthew Peter [EMAIL PROTECTED] writes:
 
  One other thing, when I wrote back I actually used
  34.31.29.20 (random), not 12.00.00.34 like i
 showed in
  the example, which is why i said it didn't work on
  digits.
 
  SELECT substring('34.31.29.20' FROM
 $$((\w+)\.\2)$$);
substring 
   ---

   (1 row)
 
  little did i know writing it with 12.00.00.34
 would
  return 00.00... so yes, that did suprise me.
  Apparently only using the identical values returns
 a
  value. so it's saying x+ one more of the same
 value
  separated by a period... where shouldn't it be any
  letter, number or underscore followed by any
  letter, number or underscore?
 
 Backreferences match the exact string matched by the
 corresponding set
 of parentheses.  It's not the equivalent of
 substituting in the
 parenthesized part of the regex and testing that for
 a match.  The
 behavior above is as expected.  If you want it as
 any followed by
 any you shold write the regex as '((\w+)\.(\w+))'
 -- then the two
 parts can differ.
 
 -Doug
 





__
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] back references using regex

2005-09-10 Thread Michael Fuhr
On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew Peter wrote:
 ah I swear I never came across any of these
 gems of information in the docs. It was these subtle
 differences that were throwing me. 

From Regular Expression Escapes in the Pattern Matching section
of the manual:

  A back reference (\n) matches the same string matched by the
  previous parenthesized subexpression specified by the number n
  (see Table 9-18).  For example, ([bc])\1 matches bb or cc but not
  bc or cb.  The subexpression must entirely precede the back
  reference in the RE.  Subexpressions are numbered in the order
  of their leading parentheses.  Non-capturing parentheses do not
  define subexpressions.

http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] back references using regex

2005-09-09 Thread Matthew Peter
One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.

SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
  substring 
 ---
  
 (1 row)

little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value. so it's saying x+ one more of the same value
separated by a period... where shouldn't it be any
letter, number or underscore followed by any
letter, number or underscore?



--- Michael Fuhr [EMAIL PROTECTED] wrote:

 On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
 Peter wrote:
  It's not a complex regex as I have wrote one that
 does
  what I want, yet not at the database level. The
 docs
  didn't help clarify anything. I'm still not clear
 on
  how it determines where the back reference comes
 from
  in the previous example you gave. And why digits
  wouldn't work. 
 
 Back references work as they usually do in regular
 expressions:
 they refer to the matched value of a previous
 parenthesized
 subexpression.  If you have multiple open
 parentheses then you
 need to refer to the correct subexpression to get
 what you want.
 Example:
 
 CREATE TABLE foo (t text);
 INSERT INTO foo VALUES ('abc.foo.foo.xyz');
 INSERT INTO foo VALUES ('12.00.00.34');
 INSERT INTO foo VALUES ('abc.def.ghi');
 
 SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
 t
 -
  abc.foo.foo.xyz
  12.00.00.34
 (2 rows)
 
 In the above query the regular expression has only
 one set of
 parentheses, so the back reference refers to \1. 
 The result set
 contains the two rows that have one or more word
 characters followed
 by a dot followed by the same set of characters.
 
 In the following query, note the difference between
 the regular
 expression in the select list and the one in the
 where clause:
 
 SELECT substring(t FROM $$((\w+)\.\2)$$)
 FROM foo
 WHERE t ~ $$(\w+)\.\1$$;
  substring 
 ---
  foo.foo
  00.00
 (2 rows)
 
 In the regular expression in the select list, we use
 the outermost
 set of parentheses for grouping, so the back
 reference needs to
 refer to the subexpression that begins with the
 second open
 parenthesis (i.e., we must use \2).  In the regular
 expression
 in the where clause, we have only one set of
 parentheses so the
 back reference is \1.
 
 Regarding digits, you didn't post any output in your
 example, so
 we don't know if it really doesn't work or if it
 just doesn't do
 what you were expecting.  Here's what I get from
 your examples:
 
 SELECT substring('12.00.00.34' FROM
 $$((\d+)\.\2)$$);
  substring 
 ---
  00.00
 (1 row)
 
 SELECT substring('12.00.00.34' FROM
 $$((\w+)\.\2)$$);
  substring 
 ---
  00.00
 (1 row)
 
 SELECT substring('abc.foo.foo.xyz' FROM
 $$((\w+)\.\2)$$);
  substring 
 ---
  foo.foo
 (1 row)
 
 Do you get different results, or do these results
 surprise you?
 They all appear to be correct.
 
  What I basically want to do is have a slice
 function
  like Python, where I can slice out items from a
 \s, \.
  or \n\n separated list. Where I'll just change the
  delimiter for the query that it applies. 
  
  Where I could start it at a certain point and end
 it
  at another. Like slicing out paragraph 3-6
 (delimiter
  \n\n) or the 2nd-6th sentence in a article
 (delimiter
  \.). That is what I am trying to do.
 
 You can use split_part() to get a single item or
 string_to_array()
 to build an array from which you can extract
 multiple items.
 
 CREATE TABLE foo (t text);
 INSERT INTO foo VALUES
 ('one.two.three.four.five.six.');
 
 SELECT (string_to_array(t, '.'))[3:5] FROM foo;
   string_to_array  
 ---
  {three,four,five}
 (1 row)
 
 SELECT array_to_string((string_to_array(t,
 '.'))[3:5], '.') FROM foo;
  array_to_string 
 -
  three.four.five
 (1 row)
 
 Is that what you're looking for?
 
 -- 
 Michael Fuhr
 
 ---(end of
 broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 





__
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] back references using regex

2005-09-08 Thread Michael Fuhr
[Please copy the mailing list on replies so others can participate
in and learn from the discussion.]

On Wed, Sep 07, 2005 at 10:40:22PM -0700, Matthew Peter wrote:
 I did read the docs ;)  I always do. The question I
 really wanted answered is how to reference the back
 references in my regular expressions parentheses. Like
 the 2nd position or 4th from a group. Like \2 or $2.
 Can I do this in postgres in the query?

Are you looking for something like this?

SELECT substring('abc.foo.foo.xyz' FROM '(([[:alpha:]]+)\\.\\2)');
 substring 
---
 foo.foo
(1 row)

That is, one or more alphabetic characters followed by a dot followed
by the same set of characters (this is a simplistic example: it would
also match 'foo.oog' and return 'oo.oo').

Note that the back reference is \2 because it refers to the inner
set of parentheses (i.e., the subexpression with the second opening
parenthesis); the outer set is used here for capturing.  And again,
note the escaped backslashes because we're using ordinary quotes.
With dollar quotes the above query would be:

SELECT substring('abc.foo.foo.xyz' FROM $$(([[:alpha:]]+)\.\2)$$);

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] back references using regex

2005-09-08 Thread Matthew Peter
That doesn't seem to work with digits 

SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
 or 
SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);

but works with strings

SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. 

Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do. I know if I can
figure how to get that working I can figure out how to
deal with extending it/handling misc. characters/etc
in pgsql. I only need a working vanilla example.
Thanks for all you help.


--- Michael Fuhr [EMAIL PROTECTED] wrote:
 SELECT substring('abc.foo.foo.xyz' FROM
 '(([[:alpha:]]+)\\.\\2)');
  substring 
 ---
  foo.foo
 (1 row)
 
 That is, one or more alphabetic characters followed
 by a dot followed
 by the same set of characters (this is a simplistic
 example: it would
 also match 'foo.oog' and return 'oo.oo').
 
 Note that the back reference is \2 because it refers
 to the inner
 set of parentheses (i.e., the subexpression with the
 second opening
 parenthesis); the outer set is used here for
 capturing.  And again,
 note the escaped backslashes because we're using
 ordinary quotes.
 With dollar quotes the above query would be:
 
 SELECT substring('abc.foo.foo.xyz' FROM
 $$(([[:alpha:]]+)\.\2)$$);
 
 -- 
 Michael Fuhr
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] back references using regex

2005-09-08 Thread Peter Fein
Matthew Peter wrote:
 That doesn't seem to work with digits 
 
 SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
  or 
 SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);
 
 but works with strings
 
 SELECT substring('abc.foo.foo.xyz' FROM
 $$((\w+)\.\2)$$);
 
 What I basically want to do is have a slice function
 like Python, where I can slice out items from a \s, \.
 or \n\n separated list. 

You could always just write it in pl/python...

-- 
Peter Fein [EMAIL PROTECTED] 773-575-0694

Basically, if you're not a utopianist, you're a schmuck. -J. Feldman

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] back references using regex

2005-09-08 Thread Matthew Peter
Ya, but I'd have to recompile to get python in. Plus,
I don't want to use Python. I want to use and learn
more pgsql. Keep things clean and lean if possible...
I just got a postgres book yesterday for additional
reading which it only had 2 pages on regex's in the
index :(

--- Peter Fein [EMAIL PROTECTED] wrote:

 Matthew Peter wrote:
  That doesn't seem to work with digits 
  
  SELECT substring('12.00.00.34' FROM
 $$((\d+)\.\2)$$);
   or 
  SELECT substring('12.00.00.34' FROM
 $$((\w+)\.\2)$$);
  
  but works with strings
  
  SELECT substring('abc.foo.foo.xyz' FROM
  $$((\w+)\.\2)$$);
  
  What I basically want to do is have a slice
 function
  like Python, where I can slice out items from a
 \s, \.
  or \n\n separated list. 
 
 You could always just write it in pl/python...
 
 -- 
 Peter Fein [EMAIL PROTECTED]  
   773-575-0694
 
 Basically, if you're not a utopianist, you're a
 schmuck. -J. Feldman
 





__
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] back references using regex

2005-09-08 Thread Michael Fuhr
On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew Peter wrote:
 Ya, but I'd have to recompile to get python in.

Recompiling to add support for another procedural language is a
one-time operation and it's easy to do, so that's not a good argument.

 Plus, I don't want to use Python. I want to use and learn
 more pgsql.  Keep things clean and lean if possible...

Clean and lean suggests using the right tool for the job.  Languages
like Perl and Python are better at string manipulation than PL/pgSQL,
and one of PostgreSQL's strengths is that it allows you to write
server-side functions in those languages.  Exploit such strengths
when it makes sense.

 I just got a postgres book yesterday for additional
 reading which it only had 2 pages on regex's in the
 index :(

Regular expressions aren't specific to PostgreSQL; there's ample
material covering them elsewhere.  See for example _Mastering
Regular Expressions_ by Jeffrey Friedl.  I'm sure a search engine
would yield many free tutorials on the subject.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] back references using regex

2005-09-08 Thread Matthew Peter
I knew I should never have said Python. I know regular
expressions, just not how postgresql handles them. The
fact of the matter is I don't want to use Python, it
was an example of the functionality I'm interested in
accomplishing with pgsql. Plus, I would like to use
other regex's once I figure out how they are used. 

I only need a regular expression in the substring of a
where cluase. Not entire language support for a single
function. 

It's not a complex regex as I have wrote one that does
what I want, yet not at the database level. The docs
didn't help clarify anything. I'm still not clear on
how it determines where the back reference comes from
in the previous example you gave. And why digits
wouldn't work. 

I would like a basic example that accomplishes what
I'm trying to do if at all possible?


My original message/problem...

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies. 

Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do. I know if I can
figure how to get that working I can figure out how to
deal with extending it/handling misc. characters/etc
in pgsql. I only need a working vanilla example.
Thanks for all your help.

--- Michael Fuhr [EMAIL PROTECTED] wrote:

 On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew
 Peter wrote:
  Ya, but I'd have to recompile to get python in.
 
 Recompiling to add support for another procedural
 language is a
 one-time operation and it's easy to do, so that's
 not a good argument.
 
  Plus, I don't want to use Python. I want to use
 and learn
  more pgsql.  Keep things clean and lean if
 possible...
 
 Clean and lean suggests using the right tool for
 the job.  Languages
 like Perl and Python are better at string
 manipulation than PL/pgSQL,
 and one of PostgreSQL's strengths is that it allows
 you to write
 server-side functions in those languages.  Exploit
 such strengths
 when it makes sense.
 
  I just got a postgres book yesterday for
 additional
  reading which it only had 2 pages on regex's in
 the
  index :(
 
 Regular expressions aren't specific to PostgreSQL;
 there's ample
 material covering them elsewhere.  See for example
 _Mastering
 Regular Expressions_ by Jeffrey Friedl.  I'm sure a
 search engine
 would yield many free tutorials on the subject.
 
 -- 
 Michael Fuhr
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] back references using regex

2005-09-08 Thread Alvaro Herrera
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:

 What I basically want to do is have a slice function
 like Python, where I can slice out items from a \s, \.
 or \n\n separated list. Where I'll just change the
 delimiter for the query that it applies. 

There is a function for some sort of text slicing, though I'm not sure
if it does what you want.  It's called split_part().  Have a look at the
documentation.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Aprender sin pensar es inĂștil; pensar sin aprender, peligroso (Confucio)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] back references using regex

2005-09-08 Thread Michael Fuhr
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:
 It's not a complex regex as I have wrote one that does
 what I want, yet not at the database level. The docs
 didn't help clarify anything. I'm still not clear on
 how it determines where the back reference comes from
 in the previous example you gave. And why digits
 wouldn't work. 

Back references work as they usually do in regular expressions:
they refer to the matched value of a previous parenthesized
subexpression.  If you have multiple open parentheses then you
need to refer to the correct subexpression to get what you want.
Example:

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');

SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
t
-
 abc.foo.foo.xyz
 12.00.00.34
(2 rows)

In the above query the regular expression has only one set of
parentheses, so the back reference refers to \1.  The result set
contains the two rows that have one or more word characters followed
by a dot followed by the same set of characters.

In the following query, note the difference between the regular
expression in the select list and the one in the where clause:

SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
 substring 
---
 foo.foo
 00.00
(2 rows)

In the regular expression in the select list, we use the outermost
set of parentheses for grouping, so the back reference needs to
refer to the subexpression that begins with the second open
parenthesis (i.e., we must use \2).  In the regular expression
in the where clause, we have only one set of parentheses so the
back reference is \1.

Regarding digits, you didn't post any output in your example, so
we don't know if it really doesn't work or if it just doesn't do
what you were expecting.  Here's what I get from your examples:

SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
 substring 
---
 00.00
(1 row)

SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);
 substring 
---
 00.00
(1 row)

SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$);
 substring 
---
 foo.foo
(1 row)

Do you get different results, or do these results surprise you?
They all appear to be correct.

 What I basically want to do is have a slice function
 like Python, where I can slice out items from a \s, \.
 or \n\n separated list. Where I'll just change the
 delimiter for the query that it applies. 
 
 Where I could start it at a certain point and end it
 at another. Like slicing out paragraph 3-6 (delimiter
 \n\n) or the 2nd-6th sentence in a article (delimiter
 \.). That is what I am trying to do.

You can use split_part() to get a single item or string_to_array()
to build an array from which you can extract multiple items.

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('one.two.three.four.five.six.');

SELECT (string_to_array(t, '.'))[3:5] FROM foo;
  string_to_array  
---
 {three,four,five}
(1 row)

SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo;
 array_to_string 
-
 three.four.five
(1 row)

Is that what you're looking for?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] back references using regex

2005-09-08 Thread Matthew Peter
Thank you for your patience and such a complete
answer. I'm not on the pgbox right now but those
examples did help clarify how to reference the back
references, which was my problem. 

I wasn't aware the 1st parenthesis must be counted as
part of the regex, I assumed it was a wrapper. Thanks
for helping me out and putting up with me. :)

matt

--- Michael Fuhr [EMAIL PROTECTED] wrote:

 On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
 Peter wrote:
  It's not a complex regex as I have wrote one that
 does
  what I want, yet not at the database level. The
 docs
  didn't help clarify anything. I'm still not clear
 on
  how it determines where the back reference comes
 from
  in the previous example you gave. And why digits
  wouldn't work. 
 
 Back references work as they usually do in regular
 expressions:
 they refer to the matched value of a previous
 parenthesized
 subexpression.  If you have multiple open
 parentheses then you
 need to refer to the correct subexpression to get
 what you want.
 Example:
 
 CREATE TABLE foo (t text);
 INSERT INTO foo VALUES ('abc.foo.foo.xyz');
 INSERT INTO foo VALUES ('12.00.00.34');
 INSERT INTO foo VALUES ('abc.def.ghi');
 
 SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
 t
 -
  abc.foo.foo.xyz
  12.00.00.34
 (2 rows)
 
 In the above query the regular expression has only
 one set of
 parentheses, so the back reference refers to \1. 
 The result set
 contains the two rows that have one or more word
 characters followed
 by a dot followed by the same set of characters.
 
 In the following query, note the difference between
 the regular
 expression in the select list and the one in the
 where clause:
 
 SELECT substring(t FROM $$((\w+)\.\2)$$)
 FROM foo
 WHERE t ~ $$(\w+)\.\1$$;
  substring 
 ---
  foo.foo
  00.00
 (2 rows)
 
 In the regular expression in the select list, we use
 the outermost
 set of parentheses for grouping, so the back
 reference needs to
 refer to the subexpression that begins with the
 second open
 parenthesis (i.e., we must use \2).  In the regular
 expression
 in the where clause, we have only one set of
 parentheses so the
 back reference is \1.
 
 Regarding digits, you didn't post any output in your
 example, so
 we don't know if it really doesn't work or if it
 just doesn't do
 what you were expecting.  Here's what I get from
 your examples:
 
 SELECT substring('12.00.00.34' FROM
 $$((\d+)\.\2)$$);
  substring 
 ---
  00.00
 (1 row)
 
 SELECT substring('12.00.00.34' FROM
 $$((\w+)\.\2)$$);
  substring 
 ---
  00.00
 (1 row)
 
 SELECT substring('abc.foo.foo.xyz' FROM
 $$((\w+)\.\2)$$);
  substring 
 ---
  foo.foo
 (1 row)
 
 Do you get different results, or do these results
 surprise you?
 They all appear to be correct.
 
  What I basically want to do is have a slice
 function
  like Python, where I can slice out items from a
 \s, \.
  or \n\n separated list. Where I'll just change the
  delimiter for the query that it applies. 
  
  Where I could start it at a certain point and end
 it
  at another. Like slicing out paragraph 3-6
 (delimiter
  \n\n) or the 2nd-6th sentence in a article
 (delimiter
  \.). That is what I am trying to do.
 
 You can use split_part() to get a single item or
 string_to_array()
 to build an array from which you can extract
 multiple items.
 
 CREATE TABLE foo (t text);
 INSERT INTO foo VALUES
 ('one.two.three.four.five.six.');
 
 SELECT (string_to_array(t, '.'))[3:5] FROM foo;
   string_to_array  
 ---
  {three,four,five}
 (1 row)
 
 SELECT array_to_string((string_to_array(t,
 '.'))[3:5], '.') FROM foo;
  array_to_string 
 -
  three.four.five
 (1 row)
 
 Is that what you're looking for?
 
 -- 
 Michael Fuhr
 
 ---(end of
 broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] back references using regex

2005-09-07 Thread Matthew Peter
Hi. 

I'm trying to do a slice directly from a table so I
can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.

Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much. 

Many thanks.
MP




__
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] back references using regex

2005-09-07 Thread Michael Fuhr
On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew Peter wrote:
 I'm trying to do a slice directly from a table so I
 can get a brief preview of the articles content by
 counting \s (spaces), not new paragraphs.

Are you trying to extract the first N words from a string?  If
that's not what you mean then please clarify.

 Anyone know how it could be done using regular
 expressions natively? I read the doc but it didn't
 help me much. 

Regular expressions aren't the only way to solve the problem, but
maybe the following example will give you some ideas:

CREATE TABLE article (id integer, content text);

INSERT INTO article VALUES (1, 'one');
INSERT INTO article VALUES (2, 'one two');
INSERT INTO article VALUES (3, 'one two three');
INSERT INTO article VALUES (4, 'one two three four');
INSERT INTO article VALUES (5, 'one two three four five');
INSERT INTO article VALUES (6, 'one two three four five six');

SELECT id, substring(content FROM '(([^[:space:]]+[[:space:]]*){1,3})')
FROM article;
 id |   substring
+
  1 | one
  2 | one two
  3 | one two three
  4 | one two three 
  5 | one two three 
  6 | one two three 
(6 rows)

In PostgreSQL 7.4 and later you could shorten the regular expression:

SELECT id, substring(content FROM '((\\S+\\s*){1,3})')
FROM article;

If this example isn't what you're looking for then please explain
what you're trying to do.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] back references using regex

2005-09-07 Thread Michael Fuhr
On Wed, Sep 07, 2005 at 05:26:07PM -0700, Matthew Peter wrote:
 Thanks. I'll check it out asap. I didn't realize the
 regex expressions needed to be escaped for it to be a
 valid expression.

If you use ordinary quotes (') around the regular expression then
you have to escape the backslashes because there's an extra level
of string parsing that you're probably unaccustomed to.  If you use
dollar quotes (available since 8.0) then you don't need the extra
escapes:

SELECT id, substring(content FROM $$((\S+\s*){1,3})$$) FROM article;

 Would it be possible to choose what paragraph to use in a summary?

You might be able to use split_part().  For example, if paragraphs
are separated by pairs of newline (\n) characters, then the following
should return each article's third paragraph:

SELECT id, split_part(content, '\n\n', 3) FROM article;

See String Functions and Operators and Pattern Matching in the
documentation for more information.  If you need to get fancy then
consider writing a function in a language like PL/Perl.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly