NOTE: I know how to solve this problem by processing the text in 2 steps, first finding all occurences of /A(.*)C/ and then searching for B in $1, but I'm wondering if there is some advanced expression for doing it in only one step.

I have an interesting little problem that I'm wondering if someone knows how to solve using regular expressions:

Given some larger text, where you have many subsections that are made up of a token A followed by an indeterminate amount of text NOT including token B and then token C, how can you find those chunks of text? I've been trying with Perl-compatible Regular Expressions through PHP, but can't come up with a way to do it.

For example,
I have an XML file, with a bunch of records. Some records are fine. Others are missing a chunk. I want to find the broken records and insert the missing tags.
Broken Record
</fh>


   30101 Agoura Ct., #115<br /></location_addr1>
   <location_addr2></location_addr2>

Fixed Record
 </fh>
 <location id="">
   <location_name>

   </location_name>
   <location_addr1>30101 Agoura Ct., #115<br /></location_addr1>
   <location_addr2></location_addr2>

I thought I would be able to find </fh> followed by </locacation_addr1> and do a lookback negative assertion to say that <location_addr1> was not present. However, not knowing the length of text between </fh> and </location_addr1> seems to make this impossible.

--
FREePHILE
We are 'Open' for Business
Free and Open Source Software
http://www.freephile.com
(978) 270-2425
"Paul Lynde to block..."
-- a contestant on "Hollywood Squares"

_______________________________________________
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

Reply via email to