Hello -
Hopefully, this is an easy one.

I have some ugly HTML like this:

<span style="font-weight: bold;"><font 
 style="font-family: Arial;"
        face=Arial>MMCM4</font></span>

I am trying to get rid of the <font> tags using HTML::TreeBuilder.  

Here is my script:

#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;

my $filename = "test.htm";
open OUT, ">", "output.txt" || die "Can't open $!";

my $root = HTML::TreeBuilder->new;
$root->ignore_text(0);
$root->ignore_ignorable_whitespace(0);
$root->no_space_compacting(1);
$root->parse_file($filename);

my @fonts = $root->look_down('_tag', 'font');

foreach my $font (@fonts) {
        $font->tag(undef);
        $font->attr('face',undef);
        $font->attr('style',undef);
        }
        print OUT $root->as_HTML("","",{});

$root->delete();

And here is what the output looks like:

<span style="font-weight: bold;"><>MMCM4</></span>

The problem is that although the font tags/attributes themselves are
removed, the angle bracket pairs <> and </>
are left behind.  This causes the starting <> to be rendered in the
browser.

I've tried using $font->detach and $font->delete, but these methods also
delete the text content which must 
be preserved.  

It seems there must be something obvious I am missing.

Thanks
Dave

Reply via email to