On Sun, 30 Nov 2008 02:51:57 +0200, Canol Gökel wrote:
> My problem is to match HTML tags with RegExp. I managed to match
> something like this, properly:
>
> la la la <p>a paragraph</p> bla bla bla <p>another paragraph</p> ya ya
> ya
>
> But when nested, there arises problems:
>
> <p>a paragraph <p>bla bla bla</p> la la la</p>
>
> It matches
>
> <p>A paragraph <p>bla bla bla</p>
>
> instead of matching the most inner part:
>
> <p>bla bla bla</p>
>
> How can one write an expression to match always the most inner part? I
> couldn't write an expression like "match a non-greedy <p>.*</p> which
> does not have a <p> inside.
Here is the pattern:
(<p>(?:.(?!<p>))*?</p>)
$ cat /tmp/foo
#!/usr/local/bin/perl
use strict;
use warnings;
# print "Perl version $]\n";
$_ = do { local $/; <DATA> };
m{
( # start capturing
<p> # match an opening tag
(?: . # match a character
(?!<p>) # not followed by opening tag
)*? # nongreedily
</p> # match a closing tag
) # end capturing
}xs and print "Matched: $1\n";
__END__
Outermost: <p>
Middle: <p>
Inner: <p> Content
</p> Trailing
</p> Trailing
</p> Finished
$ /tmp/foo
Matched: <p> Content
</p>
--
Peter Scott
http://www.perlmedic.com/
http://www.perldebugged.com/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/