I have an HTML page that is updated automatically each day. I am using
HTML::TreeBuilder to create and insert the new content.
Most of the time, this works fine, but I've hit a snag when existing text nodes
on the page includes a gt or lt symbol.
For example, I might have an existing element on the page that looks like this:
<td><B</td>
When the page is updated, depending on how I print the output, this may cause
problems.
Some techniques I use to print the output work OK for the new part but affect
the existing content adversely. Other techniques work well with the existing
content but cause problems with the new content.
Here are some of the output approaches I have tried:
I.
print OUT $root->as_HTML('', '', {});
Results: new content looks good, but the existing content is affected:
<td><B</td> #The browser won't render this and generally just blanks out the
text node.
II.
print OUT $root->as_HTML('<>&', '', {});
Results: existing content looks good; new content is output with all of the < >
in the HTML source encoded as entity references (i.e. raw HTML is rendered by
the browser).
III.
use Encode qw(encode decode);
...
my $string_rep = $root->as_HTML('<>&', '', {});
print OUT encode('UTF-8',$string_rep);
Results: same as test II.
IV.
use HTML::Entities;
...
my $string_rep = $root->as_HTML('<>&', '', {});
print OUT encode_entities($string_rep);
Results: Entire page is output with all of the < > in the HTML source encoded
as entity references (i.e. raw HTML is rendered by the browser).
V.
Various iterations of the above approaches using a subsequent call to HTML Tidy
to attempt to clean up the HTML.
Any ideas appreciated.
Thanks,
Webley
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/