Re: [GENERAL] back references using regex
How about this then, I didn't retain that information from the doc. ;) I sometimes glaze over important gems every now and then. It happens. I'm not a robot, yet. At least I know the answer to my question is now retained. You were a big help too. Thank you very much. I appreciate it. Speaking of data manipulation in a table... I was thinking about storing and manipulating a list in a column... Is it possible to append and delete (unknown location) items in a list? Or is another way more efficient? I'm relatively new so sorry if I'm asking too many questions. If possible, I will read about it if there's docs for it so I don't trouble anyone with my questions. It would be sweet to do that at the database level. Thanks again, MP Posgresql convert --- Michael Fuhr [EMAIL PROTECTED] wrote: On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew Peter wrote: ah I swear I never came across any of these gems of information in the docs. It was these subtle differences that were throwing me. From Regular Expression Escapes in the Pattern Matching section of the manual: A back reference (\n) matches the same string matched by the previous parenthesized subexpression specified by the number n (see Table 9-18). For example, ([bc])\1 matches bb or cc but not bc or cb. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions. http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly __ Yahoo! for Good Watch the Hurricane Katrina Shelter From The Storm concert http://advision.webevents.yahoo.com/shelter ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] back references using regex
On Sun, Sep 11, 2005 at 12:30:59AM -0700, Matthew Peter wrote: Speaking of data manipulation in a table... I was thinking about storing and manipulating a list in a column... Please ask new questions in a new thread with a Subject header related to the new topic. People who might be interested in following the discussion, either to learn from it or to provide answers, might miss it if the new topic hijacks an old thread. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] back references using regex
Matthew Peter [EMAIL PROTECTED] writes: One other thing, when I wrote back I actually used 34.31.29.20 (random), not 12.00.00.34 like i showed in the example, which is why i said it didn't work on digits. SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$); substring --- (1 row) little did i know writing it with 12.00.00.34 would return 00.00... so yes, that did suprise me. Apparently only using the identical values returns a value. so it's saying x+ one more of the same value separated by a period... where shouldn't it be any letter, number or underscore followed by any letter, number or underscore? Backreferences match the exact string matched by the corresponding set of parentheses. It's not the equivalent of substituting in the parenthesized part of the regex and testing that for a match. The behavior above is as expected. If you want it as any followed by any you shold write the regex as '((\w+)\.(\w+))' -- then the two parts can differ. -Doug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] back references using regex
On Fri, Sep 09, 2005 at 10:11:46PM -0700, Matthew Peter wrote: One other thing, when I wrote back I actually used 34.31.29.20 (random), not 12.00.00.34 like i showed in the example, which is why i said it didn't work on digits. When posting examples, please post something you actually tried so other people can see exactly what you're doing and attempt to duplicate your results. Little differences sometimes matter, as this case illustrates. SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$); substring --- (1 row) little did i know writing it with 12.00.00.34 would return 00.00... so yes, that did suprise me. Apparently only using the identical values returns a value. That's what a back reference is: it means match the same value you matched before, not just anything that matches the same regular expression. Usenet junkies might find it amusing to use back references to search their new server's list of newsgroups: egrep '([^.]+)\.\1\.\1' newsgroups -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] back references using regex
ah I swear I never came across any of these gems of information in the docs. It was these subtle differences that were throwing me. I didn't originally catch that regex's were based on grep/sed/awk syntax which I haven't studied throughly yet. I've only used some basic operations in bash scripts. I'll read up more on those. Thanks. MP --- Douglas McNaught [EMAIL PROTECTED] wrote: Matthew Peter [EMAIL PROTECTED] writes: One other thing, when I wrote back I actually used 34.31.29.20 (random), not 12.00.00.34 like i showed in the example, which is why i said it didn't work on digits. SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$); substring --- (1 row) little did i know writing it with 12.00.00.34 would return 00.00... so yes, that did suprise me. Apparently only using the identical values returns a value. so it's saying x+ one more of the same value separated by a period... where shouldn't it be any letter, number or underscore followed by any letter, number or underscore? Backreferences match the exact string matched by the corresponding set of parentheses. It's not the equivalent of substituting in the parenthesized part of the regex and testing that for a match. The behavior above is as expected. If you want it as any followed by any you shold write the regex as '((\w+)\.(\w+))' -- then the two parts can differ. -Doug __ Click here to donate to the Hurricane Katrina relief effort. http://store.yahoo.com/redcross-donate3/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] back references using regex
On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew Peter wrote: ah I swear I never came across any of these gems of information in the docs. It was these subtle differences that were throwing me. From Regular Expression Escapes in the Pattern Matching section of the manual: A back reference (\n) matches the same string matched by the previous parenthesized subexpression specified by the number n (see Table 9-18). For example, ([bc])\1 matches bb or cc but not bc or cb. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions. http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] back references using regex
One other thing, when I wrote back I actually used 34.31.29.20 (random), not 12.00.00.34 like i showed in the example, which is why i said it didn't work on digits. SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$); substring --- (1 row) little did i know writing it with 12.00.00.34 would return 00.00... so yes, that did suprise me. Apparently only using the identical values returns a value. so it's saying x+ one more of the same value separated by a period... where shouldn't it be any letter, number or underscore followed by any letter, number or underscore? --- Michael Fuhr [EMAIL PROTECTED] wrote: On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote: It's not a complex regex as I have wrote one that does what I want, yet not at the database level. The docs didn't help clarify anything. I'm still not clear on how it determines where the back reference comes from in the previous example you gave. And why digits wouldn't work. Back references work as they usually do in regular expressions: they refer to the matched value of a previous parenthesized subexpression. If you have multiple open parentheses then you need to refer to the correct subexpression to get what you want. Example: CREATE TABLE foo (t text); INSERT INTO foo VALUES ('abc.foo.foo.xyz'); INSERT INTO foo VALUES ('12.00.00.34'); INSERT INTO foo VALUES ('abc.def.ghi'); SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$; t - abc.foo.foo.xyz 12.00.00.34 (2 rows) In the above query the regular expression has only one set of parentheses, so the back reference refers to \1. The result set contains the two rows that have one or more word characters followed by a dot followed by the same set of characters. In the following query, note the difference between the regular expression in the select list and the one in the where clause: SELECT substring(t FROM $$((\w+)\.\2)$$) FROM foo WHERE t ~ $$(\w+)\.\1$$; substring --- foo.foo 00.00 (2 rows) In the regular expression in the select list, we use the outermost set of parentheses for grouping, so the back reference needs to refer to the subexpression that begins with the second open parenthesis (i.e., we must use \2). In the regular expression in the where clause, we have only one set of parentheses so the back reference is \1. Regarding digits, you didn't post any output in your example, so we don't know if it really doesn't work or if it just doesn't do what you were expecting. Here's what I get from your examples: SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$); substring --- 00.00 (1 row) SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$); substring --- 00.00 (1 row) SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$); substring --- foo.foo (1 row) Do you get different results, or do these results surprise you? They all appear to be correct. What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. Where I'll just change the delimiter for the query that it applies. Where I could start it at a certain point and end it at another. Like slicing out paragraph 3-6 (delimiter \n\n) or the 2nd-6th sentence in a article (delimiter \.). That is what I am trying to do. You can use split_part() to get a single item or string_to_array() to build an array from which you can extract multiple items. CREATE TABLE foo (t text); INSERT INTO foo VALUES ('one.two.three.four.five.six.'); SELECT (string_to_array(t, '.'))[3:5] FROM foo; string_to_array --- {three,four,five} (1 row) SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo; array_to_string - three.four.five (1 row) Is that what you're looking for? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster __ Click here to donate to the Hurricane Katrina relief effort. http://store.yahoo.com/redcross-donate3/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] back references using regex
[Please copy the mailing list on replies so others can participate in and learn from the discussion.] On Wed, Sep 07, 2005 at 10:40:22PM -0700, Matthew Peter wrote: I did read the docs ;) I always do. The question I really wanted answered is how to reference the back references in my regular expressions parentheses. Like the 2nd position or 4th from a group. Like \2 or $2. Can I do this in postgres in the query? Are you looking for something like this? SELECT substring('abc.foo.foo.xyz' FROM '(([[:alpha:]]+)\\.\\2)'); substring --- foo.foo (1 row) That is, one or more alphabetic characters followed by a dot followed by the same set of characters (this is a simplistic example: it would also match 'foo.oog' and return 'oo.oo'). Note that the back reference is \2 because it refers to the inner set of parentheses (i.e., the subexpression with the second opening parenthesis); the outer set is used here for capturing. And again, note the escaped backslashes because we're using ordinary quotes. With dollar quotes the above query would be: SELECT substring('abc.foo.foo.xyz' FROM $$(([[:alpha:]]+)\.\2)$$); -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] back references using regex
That doesn't seem to work with digits SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$); or SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$); but works with strings SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$); What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. Where I could start it at a certain point and end it at another. Like slicing out paragraph 3-6 (delimiter \n\n) or the 2nd-6th sentence in a article (delimiter \.). That is what I am trying to do. I know if I can figure how to get that working I can figure out how to deal with extending it/handling misc. characters/etc in pgsql. I only need a working vanilla example. Thanks for all you help. --- Michael Fuhr [EMAIL PROTECTED] wrote: SELECT substring('abc.foo.foo.xyz' FROM '(([[:alpha:]]+)\\.\\2)'); substring --- foo.foo (1 row) That is, one or more alphabetic characters followed by a dot followed by the same set of characters (this is a simplistic example: it would also match 'foo.oog' and return 'oo.oo'). Note that the back reference is \2 because it refers to the inner set of parentheses (i.e., the subexpression with the second opening parenthesis); the outer set is used here for capturing. And again, note the escaped backslashes because we're using ordinary quotes. With dollar quotes the above query would be: SELECT substring('abc.foo.foo.xyz' FROM $$(([[:alpha:]]+)\.\2)$$); -- Michael Fuhr __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] back references using regex
Matthew Peter wrote: That doesn't seem to work with digits SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$); or SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$); but works with strings SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$); What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. You could always just write it in pl/python... -- Peter Fein [EMAIL PROTECTED] 773-575-0694 Basically, if you're not a utopianist, you're a schmuck. -J. Feldman ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] back references using regex
Ya, but I'd have to recompile to get python in. Plus, I don't want to use Python. I want to use and learn more pgsql. Keep things clean and lean if possible... I just got a postgres book yesterday for additional reading which it only had 2 pages on regex's in the index :( --- Peter Fein [EMAIL PROTECTED] wrote: Matthew Peter wrote: That doesn't seem to work with digits SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$); or SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$); but works with strings SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$); What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. You could always just write it in pl/python... -- Peter Fein [EMAIL PROTECTED] 773-575-0694 Basically, if you're not a utopianist, you're a schmuck. -J. Feldman __ Click here to donate to the Hurricane Katrina relief effort. http://store.yahoo.com/redcross-donate3/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] back references using regex
On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew Peter wrote: Ya, but I'd have to recompile to get python in. Recompiling to add support for another procedural language is a one-time operation and it's easy to do, so that's not a good argument. Plus, I don't want to use Python. I want to use and learn more pgsql. Keep things clean and lean if possible... Clean and lean suggests using the right tool for the job. Languages like Perl and Python are better at string manipulation than PL/pgSQL, and one of PostgreSQL's strengths is that it allows you to write server-side functions in those languages. Exploit such strengths when it makes sense. I just got a postgres book yesterday for additional reading which it only had 2 pages on regex's in the index :( Regular expressions aren't specific to PostgreSQL; there's ample material covering them elsewhere. See for example _Mastering Regular Expressions_ by Jeffrey Friedl. I'm sure a search engine would yield many free tutorials on the subject. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] back references using regex
I knew I should never have said Python. I know regular expressions, just not how postgresql handles them. The fact of the matter is I don't want to use Python, it was an example of the functionality I'm interested in accomplishing with pgsql. Plus, I would like to use other regex's once I figure out how they are used. I only need a regular expression in the substring of a where cluase. Not entire language support for a single function. It's not a complex regex as I have wrote one that does what I want, yet not at the database level. The docs didn't help clarify anything. I'm still not clear on how it determines where the back reference comes from in the previous example you gave. And why digits wouldn't work. I would like a basic example that accomplishes what I'm trying to do if at all possible? My original message/problem... What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. Where I'll just change the delimiter for the query that it applies. Where I could start it at a certain point and end it at another. Like slicing out paragraph 3-6 (delimiter \n\n) or the 2nd-6th sentence in a article (delimiter \.). That is what I am trying to do. I know if I can figure how to get that working I can figure out how to deal with extending it/handling misc. characters/etc in pgsql. I only need a working vanilla example. Thanks for all your help. --- Michael Fuhr [EMAIL PROTECTED] wrote: On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew Peter wrote: Ya, but I'd have to recompile to get python in. Recompiling to add support for another procedural language is a one-time operation and it's easy to do, so that's not a good argument. Plus, I don't want to use Python. I want to use and learn more pgsql. Keep things clean and lean if possible... Clean and lean suggests using the right tool for the job. Languages like Perl and Python are better at string manipulation than PL/pgSQL, and one of PostgreSQL's strengths is that it allows you to write server-side functions in those languages. Exploit such strengths when it makes sense. I just got a postgres book yesterday for additional reading which it only had 2 pages on regex's in the index :( Regular expressions aren't specific to PostgreSQL; there's ample material covering them elsewhere. See for example _Mastering Regular Expressions_ by Jeffrey Friedl. I'm sure a search engine would yield many free tutorials on the subject. -- Michael Fuhr __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] back references using regex
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote: What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. Where I'll just change the delimiter for the query that it applies. There is a function for some sort of text slicing, though I'm not sure if it does what you want. It's called split_part(). Have a look at the documentation. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Aprender sin pensar es inĂștil; pensar sin aprender, peligroso (Confucio) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] back references using regex
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote: It's not a complex regex as I have wrote one that does what I want, yet not at the database level. The docs didn't help clarify anything. I'm still not clear on how it determines where the back reference comes from in the previous example you gave. And why digits wouldn't work. Back references work as they usually do in regular expressions: they refer to the matched value of a previous parenthesized subexpression. If you have multiple open parentheses then you need to refer to the correct subexpression to get what you want. Example: CREATE TABLE foo (t text); INSERT INTO foo VALUES ('abc.foo.foo.xyz'); INSERT INTO foo VALUES ('12.00.00.34'); INSERT INTO foo VALUES ('abc.def.ghi'); SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$; t - abc.foo.foo.xyz 12.00.00.34 (2 rows) In the above query the regular expression has only one set of parentheses, so the back reference refers to \1. The result set contains the two rows that have one or more word characters followed by a dot followed by the same set of characters. In the following query, note the difference between the regular expression in the select list and the one in the where clause: SELECT substring(t FROM $$((\w+)\.\2)$$) FROM foo WHERE t ~ $$(\w+)\.\1$$; substring --- foo.foo 00.00 (2 rows) In the regular expression in the select list, we use the outermost set of parentheses for grouping, so the back reference needs to refer to the subexpression that begins with the second open parenthesis (i.e., we must use \2). In the regular expression in the where clause, we have only one set of parentheses so the back reference is \1. Regarding digits, you didn't post any output in your example, so we don't know if it really doesn't work or if it just doesn't do what you were expecting. Here's what I get from your examples: SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$); substring --- 00.00 (1 row) SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$); substring --- 00.00 (1 row) SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$); substring --- foo.foo (1 row) Do you get different results, or do these results surprise you? They all appear to be correct. What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. Where I'll just change the delimiter for the query that it applies. Where I could start it at a certain point and end it at another. Like slicing out paragraph 3-6 (delimiter \n\n) or the 2nd-6th sentence in a article (delimiter \.). That is what I am trying to do. You can use split_part() to get a single item or string_to_array() to build an array from which you can extract multiple items. CREATE TABLE foo (t text); INSERT INTO foo VALUES ('one.two.three.four.five.six.'); SELECT (string_to_array(t, '.'))[3:5] FROM foo; string_to_array --- {three,four,five} (1 row) SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo; array_to_string - three.four.five (1 row) Is that what you're looking for? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] back references using regex
Thank you for your patience and such a complete answer. I'm not on the pgbox right now but those examples did help clarify how to reference the back references, which was my problem. I wasn't aware the 1st parenthesis must be counted as part of the regex, I assumed it was a wrapper. Thanks for helping me out and putting up with me. :) matt --- Michael Fuhr [EMAIL PROTECTED] wrote: On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote: It's not a complex regex as I have wrote one that does what I want, yet not at the database level. The docs didn't help clarify anything. I'm still not clear on how it determines where the back reference comes from in the previous example you gave. And why digits wouldn't work. Back references work as they usually do in regular expressions: they refer to the matched value of a previous parenthesized subexpression. If you have multiple open parentheses then you need to refer to the correct subexpression to get what you want. Example: CREATE TABLE foo (t text); INSERT INTO foo VALUES ('abc.foo.foo.xyz'); INSERT INTO foo VALUES ('12.00.00.34'); INSERT INTO foo VALUES ('abc.def.ghi'); SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$; t - abc.foo.foo.xyz 12.00.00.34 (2 rows) In the above query the regular expression has only one set of parentheses, so the back reference refers to \1. The result set contains the two rows that have one or more word characters followed by a dot followed by the same set of characters. In the following query, note the difference between the regular expression in the select list and the one in the where clause: SELECT substring(t FROM $$((\w+)\.\2)$$) FROM foo WHERE t ~ $$(\w+)\.\1$$; substring --- foo.foo 00.00 (2 rows) In the regular expression in the select list, we use the outermost set of parentheses for grouping, so the back reference needs to refer to the subexpression that begins with the second open parenthesis (i.e., we must use \2). In the regular expression in the where clause, we have only one set of parentheses so the back reference is \1. Regarding digits, you didn't post any output in your example, so we don't know if it really doesn't work or if it just doesn't do what you were expecting. Here's what I get from your examples: SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$); substring --- 00.00 (1 row) SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$); substring --- 00.00 (1 row) SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$); substring --- foo.foo (1 row) Do you get different results, or do these results surprise you? They all appear to be correct. What I basically want to do is have a slice function like Python, where I can slice out items from a \s, \. or \n\n separated list. Where I'll just change the delimiter for the query that it applies. Where I could start it at a certain point and end it at another. Like slicing out paragraph 3-6 (delimiter \n\n) or the 2nd-6th sentence in a article (delimiter \.). That is what I am trying to do. You can use split_part() to get a single item or string_to_array() to build an array from which you can extract multiple items. CREATE TABLE foo (t text); INSERT INTO foo VALUES ('one.two.three.four.five.six.'); SELECT (string_to_array(t, '.'))[3:5] FROM foo; string_to_array --- {three,four,five} (1 row) SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo; array_to_string - three.four.five (1 row) Is that what you're looking for? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] back references using regex
Hi. I'm trying to do a slice directly from a table so I can get a brief preview of the articles content by counting \s (spaces), not new paragraphs. Anyone know how it could be done using regular expressions natively? I read the doc but it didn't help me much. Many thanks. MP __ Click here to donate to the Hurricane Katrina relief effort. http://store.yahoo.com/redcross-donate3/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] back references using regex
On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew Peter wrote: I'm trying to do a slice directly from a table so I can get a brief preview of the articles content by counting \s (spaces), not new paragraphs. Are you trying to extract the first N words from a string? If that's not what you mean then please clarify. Anyone know how it could be done using regular expressions natively? I read the doc but it didn't help me much. Regular expressions aren't the only way to solve the problem, but maybe the following example will give you some ideas: CREATE TABLE article (id integer, content text); INSERT INTO article VALUES (1, 'one'); INSERT INTO article VALUES (2, 'one two'); INSERT INTO article VALUES (3, 'one two three'); INSERT INTO article VALUES (4, 'one two three four'); INSERT INTO article VALUES (5, 'one two three four five'); INSERT INTO article VALUES (6, 'one two three four five six'); SELECT id, substring(content FROM '(([^[:space:]]+[[:space:]]*){1,3})') FROM article; id | substring + 1 | one 2 | one two 3 | one two three 4 | one two three 5 | one two three 6 | one two three (6 rows) In PostgreSQL 7.4 and later you could shorten the regular expression: SELECT id, substring(content FROM '((\\S+\\s*){1,3})') FROM article; If this example isn't what you're looking for then please explain what you're trying to do. -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] back references using regex
On Wed, Sep 07, 2005 at 05:26:07PM -0700, Matthew Peter wrote: Thanks. I'll check it out asap. I didn't realize the regex expressions needed to be escaped for it to be a valid expression. If you use ordinary quotes (') around the regular expression then you have to escape the backslashes because there's an extra level of string parsing that you're probably unaccustomed to. If you use dollar quotes (available since 8.0) then you don't need the extra escapes: SELECT id, substring(content FROM $$((\S+\s*){1,3})$$) FROM article; Would it be possible to choose what paragraph to use in a summary? You might be able to use split_part(). For example, if paragraphs are separated by pairs of newline (\n) characters, then the following should return each article's third paragraph: SELECT id, split_part(content, '\n\n', 3) FROM article; See String Functions and Operators and Pattern Matching in the documentation for more information. If you need to get fancy then consider writing a function in a language like PL/Perl. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly