Re: Searching for what is not there using REGEX in only a single step
On Fri, May 28, 2004 at 01:08:25PM -0400, Greg Rundlett [EMAIL PROTECTED] wrote: NOTE: I know how to solve this problem by processing the text in 2 steps, first finding all occurences of /A(.*)C/ and then searching for B in $1, but I'm wondering if there is some advanced expression for doing it in only one step. I have an interesting little problem that I'm wondering if someone knows how to solve using regular expressions: Given some larger text, where you have many subsections that are made up of a token A followed by an indeterminate amount of text NOT including token B and then token C, how can you find those chunks of text? I've been trying with Perl-compatible Regular Expressions through PHP, but can't come up with a way to do it. Well, I don't know about PCRE in PHP, but in pure Perl, you could do the following: /A(?(?=B)(?.*)|.)*C/ This matches token A followed by token C, with a possible series of stuff in the middle. The stuff is evaluated conditionally. It uses look-ahead to see if what's coming matches token B, and if so it independently matches the rest of the line, irrevocably consuming token C, so that the required match to token C will fail, and the RE as a whole will fail to match. Otherwise, the stuff in the middle matches any character, one character at a time. Thanks for the opportunity to learn more about Perl REs. :-) -- Bob Bell ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Searching for what is not there using REGEX in only a single step
NOTE: I know how to solve this problem by processing the text in 2 steps, first finding all occurences of /A(.*)C/ and then searching for B in $1, but I'm wondering if there is some advanced expression for doing it in only one step. I have an interesting little problem that I'm wondering if someone knows how to solve using regular expressions: Given some larger text, where you have many subsections that are made up of a token A followed by an indeterminate amount of text NOT including token B and then token C, how can you find those chunks of text? I've been trying with Perl-compatible Regular Expressions through PHP, but can't come up with a way to do it. For example, I have an XML file, with a bunch of records. Some records are fine. Others are missing a chunk. I want to find the broken records and insert the missing tags. Broken Record /fh 30101 Agoura Ct., #115br //location_addr1 location_addr2/location_addr2 Fixed Record /fh location id= location_name /location_name location_addr130101 Agoura Ct., #115br //location_addr1 location_addr2/location_addr2 I thought I would be able to find /fh followed by /locacation_addr1 and do a lookback negative assertion to say that location_addr1 was not present. However, not knowing the length of text between /fh and /location_addr1 seems to make this impossible. -- FREePHILE We are 'Open' for Business Free and Open Source Software http://www.freephile.com (978) 270-2425 Paul Lynde to block... -- a contestant on Hollywood Squares ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Searching for what is not there using REGEX in only a single step
It's possible you could approach the problem more simply, maybe like this: starting from every instance of /fh start gathering all text except anything that looks like a tag (ie. discard all tags) up until the point where you find an instance of /location_addr1. You're then situated where you have the desired text (sans tags) and you know exactly where you are, so you should be able to utter (that part of) your record with the desired format. In other words, instead of rewriting just the damaged records, rewrite ALL the records. ...just a thought, based only upon the info supplied, FWIW, YMMV, etc... ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss