Re: Removing HTML Tags

david Thu, 23 Jan 2003 11:17:57 -0800

Colin Johnstone wrote:

> I guess Im looking for a regex to remove anything between the font tags
> e.g <font>and </font>. Of course their could be anynumber of attributes in
> the openning font tag.


i know the temptation is to just use a reg. exp. but you should really 
consider using module that has proven to be working. not only you will be 
confident that the module will work, the module will probably provide more 
functionality that makes your script more extensiveable in the future. a 
large part of becoming proficient programmer is to learn to use the 
libraries that the language provides. for example, the HTML::Parser module 
in CPAN is designed just for parsing HTML page. to remove the <font> tag, 
for example:

#!/usr/bin/perl -w
use strict;

use HTML::Parser;

my $text = <<HTML;
Some text.
<i>italics</i>
<b>bold</b>
<FONT  class="whatever" color=red size="2"><i>
<font>Hi There</i></font></font>
<font>ABC</font> <h1>Hi
<font></h1>
</font>
HTML

my $html = HTML::Parser->new(
                api_version => 3,
                text_h      => [sub{ print shift;}, 'dtext'],
                start_h     => [sub{ print shift;}, 'text'],
                end_h       => [sub{ print shift;}, 'text']);

$html->ignore_tags(qw(font));

$html->parse($text);
$html->eof;

__END__

prints:

Some text.
<i>italics</i>
<b>bold</b>
<i>
Hi There</i>
ABC <h1>Hi
</h1>

you might be thinking that a one liner reg. exp. is a lot less to type but 
notice how clean your script reads without tons of reg. exp. Of course, 
there is nothing wrong with trying out the reg. exp. for educational 
purpose.

david

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing HTML Tags

Reply via email to