>I want to extract from a large number of html files everything between >the following specified comments, including the comments themselves: > ><!--Begin CMS Content-->...<!-- End CMS Content--> <snip> >And the regular expression I've got is > >'/[<!--Begin CMS Content\-\->].+[<!-- End CMS Content\-\->]/s' > >I expected that when I ran this using preg_match_all I would get two >matches
Those brackets mean "match one any of the characteres found within", so it will match '<', or '!', or '-', or 'B', or... You want this: '/<!--Begin CMS Content-->(.+)<!-- End CMS Content-->/Uis' ...which gets you this (I added the parentheses in the middle so you could also get the stuff inside the CMS content delimiters): Array ( [0] => Array ( [0] => <!--Begin CMS Content--> <span class="headline">Breadth Requirement</span> <hr class="under" /> <!-- End CMS Content--> [1] => <!--Begin CMS Content--> <strong>More Matched Content!</strong> <!-- End CMS Content--> ) [1] => Array ( [0] => <span class="headline">Breadth Requirement</span> <hr class="under" /> [1] => <strong>More Matched Content!</strong> ) ) --------------------------------------------------------------------- michal migurski- contact info and pgp key: sf/ca http://mike.teczno.com/contact.html -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php