Re: Regexp help

Jan Eden Mon, 26 Jan 2004 07:33:12 -0800

John McKown wrote:

>On Sat, 24 Jan 2004, Marcelo wrote:
>
>>  Which regular expression would you use to remove the <title> and 
>> </title> from a line like this one:
>> 
>> <title>Here goes a webpage's title</title>
>> 
>> Thanks a lot in advance.
>> 
>
>Did you what that _exact_ input? I.e. always <title>...</title>? If so, 
>that's rather easy.
>
>$line =~ s/<title>(.*)<\/title>/$1/
>
>Now, if you want the more general form of <any_tag>...</any_tag>, that is 
>removing paired HTML tags, that's more difficult. Luckily, it is an 
>example in "Programming PERL, 3rd Edition" on page 184 which is close.
>
>line =~ s/(<.*?>)(.*?)(?:</\1>)/$2/


I remember reading that using regex to parse HTML is not reliable. You should use 
HTML::Parse from CPAN.

HTH,

Jan
-- 
Either this man is dead or my watch has stopped. - Groucho Marx

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Regexp help

Reply via email to